Showing posts with label pronunciation. Show all posts
Showing posts with label pronunciation. Show all posts

07 July 2014

Musings on confusings...

Since I first learned I had got the job in Sicily, my Spanish has suffered. The day after the job interview, I was at a Spanish/English language exchange, and I kept dropping words of Italian into my Spanish. The weird thing is that my Spanish was a million times stronger than my Italian then, but somehow my brain had switched "mode".

Obviously, living in Italy for four months has only served to intensify this, with my Spanish now being half-hidden behind fairly broken bits of Italian. My assumption for a long time was that my problem was in my accent -- I still speak Italian with a bit of a Spanish twang. This belief was bolstered by the fact that my Catalan, while being very, very weak from lack of use, didn't seem so badly affected. The Catalan accent is very, very different from Spanish and Italian.

However, I was at a Couchsurfing meeting on Friday night which changed my mind. There was an Andalusian tourist visiting, and when I spoke to her, my accent was more different from the one I use in Italian than I had expected. My brain started playing tricks on me, and I had difficulty speaking Italian when she was in my line-of-sight, and for a while I was wobbling between Italian and Spanish.

But that's not the important thing.

When I was speaking Italian, I got into much deeper and more complex conversations than I normally would, and rather than jamming up as I hit the limits of my Italian, I was automatically switching to Spanish to fill in the gaps. Now, I wasn't just importing words or grammar rules from Spanish into Italian -- no, I was switching into Spanish; conjugations, pronouns and all. As I became aware I was doing this, it dawned on me that I'd been doing it for my whole stay, but normally I'd just not thought about it too much and fallen back to English.

This is a bit of a new sensation... or actually, no. The only new thing is the fact that I was unaware of it. When it was Scottish Gaelic and French, for example, it would be instantly noticeable. The difference here is that the similarity of the languages (including, but not limited to, accent) allowed it to slip through the net on occasions.

The trigger mechanism is the same, regardless of language: hit a gap in your knowledge in one language and the brain will fall back on another. The only difference lies in detection.

This makes me wonder if the only option I have now to get my Spanish back is... to learn more Italian. My theory is that filling in the main gaps in my Italian will not only stop me falling back on Spanish when I run out of Italian, but that as a consequence of this, it will reduce the strength of the linkage between the two, allowing me to speak Spanish without Italian interrupting me.

It looks like I might be practising my Italian a lot, even once I leave Italy...

14 August 2011

Phonology -- whats and hows part II

Last time, I wrote about phonology and the necessity of physically training the tongue to produce new sounds.  However, as I pointed out, not all new phonemes require new physical skills.  Can we pick these up just by listening?  I think not, and I'd be happy to tell you how.

Meaningful sounds

The problem that I'm always trying to stress is that the brain is only interested in meaningful input -- if something has no meaning, the brain isn't interested.

This leads to some striking (and often unexpected) results. The BBC documentary Horizon showed this with colours in the programme Do You See What I See? (UK only). In the program, you see several Himba tribespeople trying to pick out different colours on a computer screen. The show two tests -- one with a very slightly different green, which is difficult for the viewer and fairly easy for the Himba, and one with an obviously different colour... well, obvious to us, but not to the Himba.

The distinctions that the Himba find easy are ones that they have names for, and the distinctions we find easy are the ones we have names for. It would appear that the act of naming something focuses the consciousness on it, so if you tell me that a French P has a puffy sound, I'm more likely to notice it, because I know what I'm looking for.

Consider the old face/vase optical illusion: the first time you look at it, you see either the faces or the vase, and your brain fixates on that single image. If someone else tells you about the other picture, you struggle to see it at first, because your brain already sees something meaningful in the image. But once your brain finally sees the second image, you can change your mental focus between the two meaningful images at will.

But that example doesn't say much about subjectivity and objectivity, because the two objects are fairly arbitrary. A better example would be one where you can predict what the viewer will see based on simple demographic information. Maybe adults vs children, like this painting, where adults immediately see a particular image and children see a different one. (View the picture, and then read the explanation on the page.  I saw the second picture without reading the explanation, but only because I could understand the French label on the bottle....)

So what is meaningful to us is normally a matter of past experience and expectation. When it comes to meaningful sounds, past experience and expectation all comes from the languages we already speak.  So it would follow that we need to consciously draw the student's attention to the differences, or they're just not likely to notice them.

What do we need to draw their attention to?

The phoneme is not the minimal unit of sound

The phoneme is often mistakenly considered the atomic unit of pronunciation in a language, but most languages build their phonemes out of a series of distinctions, in a fairly systematic manner.

In English, for example, we have voicing of consonants as a distinction, and it occurs pretty much wherever it can.  Voicing is the difference between P & B (at the front of the mouth), T & D (in the middle) and C/K & G (at the back).  We also have nasalisation, which takes those three pairs and gives us the sounds M, N and NG.  It's a stable and systematic structure.

There are other languages (EG Gaelic) where the distinction between P & B is not one of voicing, but aspiration.  The same distinction carries through for P&B and T&D.  In fact, it's hard to find any language that has a voicing distinction on one of those pairs, but makes a distinction in aspiration -- in general, the same distinction carries through.

Polish gives a great example of how regular these consonant distinctions can be.
In the diagram above, you can see a clear structure uniting 12 sounds in 3 distinctions (two 2-way distinctions, one 3-way distinction).  It's almost entirely systematic -- this cannot happen by accident, so we must assume that the native speaker's internal model of language acts on the level of these distinctions.

For this reason, I believe that it is not enough to draw the learner's attention to an individual phoneme, but that we must teach them the individual distinctions.

This doesn't have to be done in a dry "linguistics" way, though.

Teach once, then repeat

When teaching a phonemic distinction like voicing or aspiration, you don't need to start with the idea in the abstract.  Instead, you can start by teaching the pronunciation of one letter, then its contrast (eg P first, then B).  In teaching the contrast, you pick a word that describes it ("puffiness" or "breathiness" is more meaningful than "aspiration") or you just describe it.  Then when you move onto the next pair (T,D), you can refer back to the first pair, because it's the same difference.  And once you get to the final pair (K,G), it'll be very easy to do.

Of course, this means that you have to restrict the number of phonemes to start off with, but there are many people who are theoretically in favour of gradually introducing phonemes -- it's just the order of material that messes them up.

Teaching one thing at a time

Most teachers like to start with seemingly useful words and phrases.  Hello, how are you, goodbye -- that sort of thing.  This takes away the teacher's control over the phonemes -- teachers don't choose them, they just use whichever ones pop up.

Worse, quite a lot of teachers will introduce numbers early on, and in many languages you'll have encountered half of the phonemes of the language by the time you reach ten.  (This probably isn't an accident -- ambiguity in numbers would be a problem, so they naturally evolve to be fairly different.)

One commercial course points out this problem, and suggests that the way round it is to teach numbers one at a time, in a way which supports a progressive increase in the number of phonemes.  The example they used was 10 and 100 in Spanish: diez and cien.  These two words share all but one phoneme (C before I or E is pronounced the same as Z in Spanish), so if you teach one then the other, you're only introducing one phoneme the second time round. 
(I think I remember which course this was, but the blurb on the website no longer mentions this, so I'm not going to link to it.)

And after all, why should we teach numbers in numerical order in a second language?  When teaching children numbers in their first language, we're teaching both the concepts and the words, but in a second language you're only teaching the words, because they've already got the appropriate concepts to peg them to.  We can now selectively use any of those pegs we want to, in any order we want to.

Putting it together

So if we teach a couple of consonants well, and then we introduce new consonants one by one, we can use the earlier consonants as an anchor to show repeated distinctions.  It doesn't matter whether the student can consciously remember what those distinctions were -- a native speaker normally wouldn't have a clue.  What matters is that the model the student uses automatically for pronunciation implicitly respects the consistent rules of the language.

This will not happen if the student is left to listen, because one misheard phoneme can threaten the integrity of the entire structure -- pull any one of the sounds out of my neat little Polish diagram and dump it somewhere else and the whole thing will collapse.

Next time

Previously I spoke about sounds as new muscle movements, today I spoke about simply the meaning of sounds.  Next time, I'd like to demonstrate how almost all new sounds really are new physical movements anyway.

10 August 2011

Phonology -- whats and hows

A couple of weeks ago, I was discussing the importance of phonology, trying to demonstrate why it should be consciously dealt with in the teaching/learning process, but I took the decision not to include any comments on how to teach it in that article.  Basically, I didn't want to give anyone any grounds to reject my argument out-of-hand.  In this post, I'd like to cover how I believe it should be taught, but remember that this, the how, doesn't affect my argument on the importance, the why.  Reject my methods if you want, but please don't reject phonology as an area of study.

So, what did I establish in the previous post?
  • Incorrect pronunciation of an individual phoneme leads to problems in pronouncing clusters with that phoneme.
  • Problems in pronouncing certain sequences of phonemes lead to grammatical errors.
  • That vocabulary is harder to learn when you're not familiar with the rules of pronunciation in a language.
  • That not understanding target language phoneme boundaries makes it hard to understand native speakers.
  • That sounds that the learner drops in speech are often matched by a dropping of the corresponding letters in writing.
These are things that I have observed and do not see as particularly controversial.  And yet, my conclusion that pronunciation requires active instruction is rejected by many teachers.  Accent, they say, will take care of itself.  And accent, they say, is a personal thing.  But we're not talking about accent.  Accent is something that is layered on top of phonology.  Phonology is like the basic letter forms in writing, accent is more like individual differences in handwriting.  At school we are taught initially to get the basic forms right, and over the years we develop our own personal "hand".

Can we learn pronunciation from listening?

Some even argue that we learn pronunciation from hearing (and they sometimes add "just like children").  However, as I tried to demonstrate in my recent post receptive skills as a reflective act, there is good reason to believe that we understand language by comparison to our own internal model of the language.  In the follow-up post, I gave a concrete example of mishearing a word on Italian radio, and how my flawed internal model was good enough to understand the message without perceiving every sound.

OK, so that's anecdotal and doesn't prove a general case, but ask yourself this: how many different accents can you understand in your own language?  And how many of those accents can you speak in?

So you can see that simple exposure hasn't given you extra accents.  As I said above, accent is not phonology.  But our brains have learned to ignore accental differences (to an extent) to enable us to understand the widest possible number of people around us.  So if our brain assumes a different phonology is just a different accent, it throws away all the information you're supposed to be learning from.

So I really don't believe it's possible to learn from "just listening", no matter how much you do.

Motherese and exaggeration

Here's the outcome of an interesting study (YouTube video).  It turns out that when we teach kids to speak, we don't expect them to learn from natural speech, but we exaggerate our phonemes, effectively making them "more real than real" or "whiter than white".  And if you think about it, isn't this what we do when speaking to foreigners or people with a very different accent from ours?

The point is that we have to make the differences clear and noticeable, so that one phoneme doesn't blend into another.

I would suggest that this points towards the right answer in language teaching to adults: if even children (who have no preconceptions of what a phoneme is) need extra emphasis to understand the difference between similar phonemes, then us adults (who are biased towards our native language's phonology) really could do with a bit of help.  The brain has to be told that this new information is useful, or it will throw it all away.

Exaggeration of pronunciation appears to help the listener notice the differences.

Learning pronunciation through pronouncing

However, we learn to dance by dancing, and we learn to drive by driving.  In both cases we can pick up a few hints and tips from watching, but we need a heck of a lot of practice.  Why shouldn't this be the case with language?

People are very quick to tell me that language is different from every other skill.  That is a valid opinion, but it is still only an opinion - no-one has ever presented anything to me that demonstrates it to be true, or even likely.  Right now, it's just a theory... and it's one I do not believe.

To me, pronunciation is a muscle skill.  Let's consider some of the extremes sounds that don't occur in English.

Take retroflex consonants.  Retro - backwards; flex - bend.  In retroflex consonants, your tongue bends backwards, and the tip goes behind the alveolar ridge.  This type of sound doesn't occur in English, so a monolingual English-speaker will probably never produce this sound in his life.  If you ask such a person to put their tongue into that position, they won't be able to -- their tongue just can't bend that way.

But then your average person couldn't do yoga postures on a first attempt either -- the yoga teacher will lead them through some simple postures and exercises to encourage the muscles to stretch and strengthen appropriately until they are capable of performing the required movements.

The brain doesn't prepare the muscles just because you've seen the movements; the body prepares the muscles once you've started doing the movements.  Your brain similarly cannot train the tongue as it's just another muscle, after all -- only the body can do that.

So clearly, there are certain sounds that must be taught consciously, or the learner won't physically be able to say it.  But obviously there are also sounds that the learner is physically capable of saying, but isn't in the habit of saying.

This post is starting to get a bit on the long side, so I'll come back to the question of this second category of sounds next time.

How I learned to pronounce retroflex consonants

I had a notion to learn a few words in various Indian languages a few years ago when I was working in IT support.  Our front-line helpdesk was in India and I wanted to try to build a better rapport with my coworkers.

One of the sources I used stated quite plainly that while languages like French and Spanish let you get away with "close enough" pronunciation (not entirely true...) with Hindi, you would simply not be understood if you spoke in an English-speaker's accent.  It described the retroflex articulation and what I did was to start doing a regime of "tongue stretches" -- as I walked to and from work, I would tap my tongue continually off the roof of my mouth, and move it slowly backwards and forwards, to create a sort of silent T-t-t-t-t-t-t-t-t-t-t or D-d-d-d-d-d-d-d-d-d.  Every day I could reach slightly further back, and in about a week and a half I was able to produce a convincingly Hindi-like retroflex for all of the various consonants (except R, cos that's really quite complicated). I was curious about how far I could go, and within another few days I'd got to the point where I could touch the tip of my tongue to my soft palate.


So certain sounds need to be learned physically, and it's something that can be done.  Next time, I'll start looking at sounds that are more a matter of habit, and showing that the boundary between "habit" and "ability" isn't always that clear.

29 July 2011

The importance of phonology

OK, so I promised this a while ago, and I've let myself get distracted by a few other points in the interim, but I'll try to draw them in and show how they are related to the teaching of phonology in general.

In my posts 4 skills safe and 3 skills safe, I argued that the division of language teaching into the traditional 4 skills of reading, writing, speaking and listening was trivial, superficial and of very little pedagogic value.  Instead, I suggested that we should look at individual skills of syntax, morphology and phonology, and that we could add orthography as an additional, more abstract skill (Lev Vygotsky described reading and writing as "second-order abstractions").
vowe
Phonology often gets very little attention in the classroom, as it is seen as a sub-skill of speaking, and speaking's "difficult".  But phonology is fundamental to many languages.

If you haven't already, you might want to take a look at my posts In language, there's no such thing as a common error, and Common errors: My mistake!  In the first post I described a particular common error in written English (might of instead of might have, could of instead of could have etc) and in the second I expanded on the mechanisms that cause this "error", with the aim of showing that this wasn't an "error", but in fact a change in grammar, analogous to changes that have occurred in other languages.  What I didn't focus on there, but which is extremely relevant here, is that this change in grammar is pronunciation-led -- ie the phonology of English has caused this change in grammar.  The prosody of English has led to 've being always weak, and it has lost the link to the related strong form have.

And of course the change in the Romance languages that I mentioned in the second post is also led by phonological patterns.  If you look at any language whatsoever, many grammatical rules have arisen from mere matters of pronunciation.

The archetypal example is the English indefinite article -- a/an.  You may well be aware that like most other Indo-European languages in western Europe, this evolved out of the same root as the number one.  But the modern number one is a strong form and has a diphthong.  A/an is a clitic and always weak, so split off (completely analogous to 've and have).  This weak word /ən/ then lost its [n] before consonants, simply because it's easier to say that way, and retained it before vowels again because it's easier to say like that.  (And if you'll indulge a slight digression, that brings us back to would've etc, because you'll often hear woulda before a consonant and would've before a vowel.)

If you look at the Celtic languages, one of the trickiest parts of the grammar is the idea of initial consonant mutations.  Lenition in Modern Irish is a bit inconsistent (probably due to the relatively large number of school-taught speakers against native speakers), but the three mutations in Welsh are fairly systematic, with mutated forms usually only differing from the radical in one "dimension" of pronunciation.

These sorts of rules become very arbitrary and complex when described purely in terms of grammar, whereas when considered physically, they make a lot more sense.

Let's go back to a/an and take a closer look.  We all know the rule: a before a consonant, an before a vowel, right? Wrong! It's actually: a before a consonant phoneme, an before a vowel phoneme.  To see the difference between the two, fill in the following blanks with a or an:
I want __ biscuit.
I need __ explanation.
He is __ honest man.
I have __ university degree.
Now it's not a difficult task for a native speaker, because you wouldn't normally have to think about it: honest may start with the letter H, but you know intuitively that you don't pronounce it, so you write an without thinking.  Similary, university may start with the letter U, but you know intuitively that it starts with a y-glide sound (like "yoo", not "oo"), so you write a.

I have seen quite a few English learners write "an university" or "a honest man" because they are either trying to work from a grammatical rule in isolation from pronunciation, or because they simply pronounce these words wrong.  In the case of honest, the problem is compounded if the student can't pronounce H, because if he follows the rule correctly on paper, he undermines the phonological basis for the true rule.

It follows, then, that we cannot teach grammar without considering phonology.  (And anyone who has succeeded in understanding the French liaison rules can tell you categorically that this is true.)

But how does phonology affect us in other ways?

Phonology and the ease of vocabulary learning

It may seem trivial, but for his PhD thesis, an Australian teacher of Russian demonstrated that it is easier to learn foreign words that are possible in your native language than ones that aren't.  EG the word brobling with first-syllable stress is easy, brobling with second-syllable stress is a bit harder, grtarstlbing with lots of consonant clusters that can't occur in English is very difficult.  He then took a massive leap of logic that I'll examine later in greater depth.

This corresponds with what a lot of teachers believe, but few teachers have the time or patience to implement: that it's easier to teach phonemes one at a time and reuse them in different words.  Again I'll come back to that when I start discussing techniques.

For now, though, I'll simply suggest that it's easier to learn words that are made out of familiar "blocks" than ones that aren't.  It follows from this that good teaching of phonetics (whatever that means) is a prerequisite to vocabulary learning.

Phonotactics: the "crisps" problem

My high school had an exchange programme running with a school in France.  Teenagers are naturally curious beasts, and when my big brother and sister first went on one of these exchanges, the class discovered how funny it was to get the French people to say crisps (UK English for what the French and Americans call chips).  Very few of the French kids could actually pronounce it, because they were using French phonemes with a northern accent (the school was near Lille).  The French P is unaspirated (unlike English) and the French S is quite slender and hissy.  As a combination of sounds, French SPS is difficult, nearly impossible -- the P either gets lost in the hiss or one of the Ses gets cut short.  The English combination is physically much easier.

Similar problems occur in other places.  Spanish people find wants quite difficult to say, because Spanish T is not compatible with Spanish N or S due to the method of articulation.  NTS in Spanish needs the tip of the tongue to be in two different places at once -- the alveolar ridge for N and S and the gumline for T.

The problem is that many books will tell us that T, D, B, P etc are sufficiently similar in English and Spanish, French or whatever that we can use them equivalently, but this is only true for each phoneme in isolation.  Once we start trying to combine them, the differences start to accumulate.

Which brings us back to:

Grammar again - and how writing suffers for it

If you cannot pronounce the inflectional affixes in a language, your grammar suffers.  Many, many Spanish learners of English drop their -s and -ed suffixes because of the problems of incompatible sounds.  They replace it's with is.  These mistakes filter through from their pronunciation into their internal model of grammar and eventually into their writing.  But it's easy to ignore this, because most of the time they correct their own writing mistakes with their declarative knowledge, and on the few occassions where they don't correct it, the teacher simply tells them the rule again, but never attacks the root cause of the problem: if they learned to pronounce English [t] and [d] phonemes, most of the difficult sound combinations would become much, much easier, their internal model of the grammar would be built up to incorporate these non-syllabic morphemes (and there are no non-syllabic morphemes in Spanish as far as I know, so it's a totally new concept to them edit (2-feb-2014): Spanish has at least one non-syllabic morpheme: plural S after a vowel.) and they would write natural based on their procedural knowledge of the grammar..

And finally...
Allophones and comprehension

Apparently there are certain accents that are considered "hard" in some languages. Now I'm not implying that there is no such thing as a hard accent, but I do believe that most of the difficulties stem from the teaching, not from the language.

In Spain, the accent of Madrid is considered quite difficult to understand.  The reason for this is that the madrileño accent tends to lenite (weaken or soften) its non-intervocalic consonants.  The classic is the weaking of D to /ð/ (roughly equivalent to TH of then).  There is little physical similarity between the English D and ð as is clear from their technical descriptions: /d/ - voiced alveolar plosive; /ð/ - voiced dental fricative.  But the Spanish /d/ is a voiced dental plosive, which the description shows is quite similar to /ð/.  Basically, the soft D in Madrid is basically an incomplete hard D -- the tongue doesn't quite go far enough to touch the teeth and stop the sound, but instead it hisses slightly.

Now, if understanding language is a reflective act (as I claim here and here) then we understand sounds by considering what shape our mouths would be in if we were to make the sound we hear (something suggested by the concept of mirror neurons).  The soft and hard Ds in Spanish are not "soundalike" allophones at all, but they have a similar shape, which is different from the English D.  To me it seems clear that physically learning the Spanish hard D shape would result in better comprehension of the similarly shaped soft D in a way that simple hearing it won't accomplish.


Conclusion

All in all, it seems to me that phonology is an intrinsic component of language, and that the system of a language falls apart when phonology is not given the proper support throughout the learning process.

As for how to teach phonology, I have my own views, but I'm currently reading up on some alternative opinions so as to give a more balanced write-up of the options available.

29 May 2011

Mechanics' Meaningful Music

It is often claimed that an adult cannot learn the sound system of a new language.  This claim is followed by the caveat that some adults do, but these adults are dismissed as exceptional, and non-typical.  Certainly, they are exceptions, because most people don't, but there's a big difference between "don't" and "can't".

A sound system is composed of phonemes, which are often defined as minimal units of meaning in sound.  Every human is capable of producing a whole range of sounds, regardless of their language, and to process every single detail of the sound produced would simply be too much for the brain, so we bundle the sounds up together, and even though the sounds of the T in "try" and "butter" may be slightly different (or completely different, depending on your accent), we still recognise them as being the same thing.

As a general strategy, this works.  The adult human meets lots of people with slightly different accents, but the phonemes are all roughly the same, so the detail of the differences is irrelevant.

As a language learner, though, this starts to pose problems.  Our brains believe that only certain sounds are meaningful, and therefore discard any information they believe to be irrelevant.  If you have a language with two phonemes equivalent to one in your language, you will not believe the distinction is meaningful.  Just take a look at Japanese, where they have one phoneme equivalent to the English L and R.  Many Japanese learners of English cannot hear the difference between "law" and "raw" or "appear" and "appeal" without active concentration.

Most people get stuck in this rut their whole lives, and this is used as evidence that you can never learn the sound system.  But if we step outside the world of language, we might just find reason to be more optimistic.

It's amazing what an experienced mechanic can determine about a car or other machine just by listening to it.  Sounds that to you or me would just be squeaks and squeals are to him a full description of the workings and faults of the engine.

Mechanics develops this skill over time through a mixture of direct instruction and experiential learning.  The engines they work on give constant feedback that develops into a meaningful structure -- if a given whine co-occurs with a drop in revs, the two become associated and the sound takes on meaning.

But by this reasoning, surely language itself should give a meaningful framework to sounds?

Unfortunately, it would appear not, and it isn't actually that hard to see why.

Language has evolved to have a certain amount of redundancy, a certain level of "fault tolerance".  It is very difficult indeed to find any complete sentences that function as minimal pairs (ie that differ by one phoneme only), particularly within the restricted language set that most beginners are faced with.

Going back to my earlier examples, "law" is a noun, "raw" is an adjective.  There will always be enough information in the context to tell the two apart.  "Appeal" and "appear" are both verbs, but the usages are distinct.

Essentially, I believe that the average learner is never really forced to build a meaningful framework for these differences.  The end result is that they get deeper and deeper into the language, building more and more coping mechanisms that a native speaker would never rely on.  The model of the language they build is wrong, and while they can understand most things they hear, the person they are speaking to often cannot understand them because, as I said, the native speaker doesn't employ the same strategies as the learner.

What is needed is for the teacher to force a meaningful framework, and the only way I can see that happening is through early teaching of pronunciation.  If a learner has to pronounce the difference between ż and ź in Polish, and is corrected when using the wrong one, his brain will know there's a meaningful difference.

It may not be fashionable, but some negative feedback is definitely necessary....

11 April 2011

The importance and unimportance of accent

Accent is essentially unimportant.  It's the final coat of paint that makes our language pretty or ugly, shiny or dull.  It is something that the beginning learner really doesn't need to think about.

Unfortunately, this is something that is frequently overinterpreted, because many people don't appreciate the fact that accent is only one part of pronunciation.

Every language has it's own phonology -- it has a set of possible sounds and possible combinations of sounds, and it has a set of distinctions between sounds.  Though we do not need to learn a good accent from day one, we certainly need to learn the sound system of a language.

The most common consequence of conflating the sound system with accent is the idea that the "closest sound" from your native language is "good enough".  Open up almost any beginner's book and it'll start with a list of sounds described along the lines of "like the a in cat", "like t in English".  But this is rarely true.

Still, some languages will let you get away with this to some extent, but when you hit a more complicated language, it all crumbles.

Any book on Hindi will tell you that "closest sounds" just won't cut it, and that with that approach you will never be understood.  This is because Hindi has more sounds than most other languages.  In fact, there are 8 sounds that are approximately similar to T and D, so using an English T and D, or a French one, or a German one, would leave you completely unable to distinguish certain words, and unable to make yourself understood.

Worse, because you treat 4 different sounds as one, you will never learn to hear the difference either -- your brain only distinguishes sounds that mean something.  The later you attempt to fix it, the harder it will be, because you will have to relearn all the vocabulary in order to learn the difference.

I've experienced this personally with Spanish.  In Spanish C is pronounced like Z, when the C is followed by I or E.  In some areas, these in turn sound like S, but in other areas, they don't.  I started learning Spanish from a course that didn't make a distinction between S and Z, but as I progressed I spent more of my time with people who make the distinction than those who don't.

As a result, I started trying to speak like them.  However, my brain was trained to see the two things as one, so I was prone to making mistakes such as pronouncing the word "especial" as "ezpesial".  My errors were arbitrary, but not random -- they were consistent and there was a clear pattern.  My brain was still seeing the two as equivalent, but was trying to explain it in terms of the other sounds in the word.   Thankfully this was still pretty early, so I caught it and fixed it.

Other people are not so lucky.  A local French teacher (from France) can pronounce all the sounds of English.  But he couldn't before he came here.  The result is that he has already learned all the words with the wrong sounds.  The classic example is TH -- it's always T or D when he speaks.  He's learnt the words now, so there's no going back.

Now, that's something that we call "falling together" (ah, a nice, self-descriptive term for once), but phonemes can also split apart.

Consider that Japanese doesn't make a distinction between liquids L and R.  As an English speaker learning Japanese, I would likely hear these as different phonemes (meaningful units of sounds) instead of simply different ways of pronouncing the same phoneme ("allophones").

This splitting apart on its own isn't a big problem -- it doesn't lose information the same way falling together does.  However, the two can very easily co-occur,  and at that point they make a bad situation worse.

Take for example "CH" in German.  It has two allophones -- a hard one (ach) similar to the sound in Scottish "loch", and a softer palatised one (ich) that takes on a quality similar to the English SH.  But there's another phoneme that sounds even more like the English SH, one that's usually written SCH.  And it gets worse, because in certain combinations of consonants, S starts to sound the same.
If we're not careful, the learner may end up splitting their Ss and their CHs, and putting half of each in the same box as SCH.  The result is a map of the sound system which looks nothing like the native speakers view of things.

Accent is what we put on top of the sound system to give it colour and personality.  You cannot develop a good accent based on an incorrect map of the sound system.  Pronunciation has to be taught from the start in such a way as to encourage a consistent and correct sound system.

Accent can wait until later, but pronunciation must taught in some form right from the start.