davewantsmoore Posted January 22, 2016 Posted January 22, 2016 The key claim of those associated with the MQA technology is that human can hear time resolution down to 7us (microseconds). Let's be very pedantic .... and clarify this. Meridian do not claim this. They refer to others (proper experts) who claim it. Nobody is refuting it. 1
davewantsmoore Posted January 22, 2016 Posted January 22, 2016 This time resolution cannot be achieved by CD sampling rate, so therefore necessitating a sampling rate of 192kHz. This is their claim. Again.... the specifics are quite important. Time resolution might mean: 1) A signal (independent of it's frequency) can start and stop at a certain time 2) A signal has a period of (short than) X length 1) Says, for example... that you can have a 10khz signal (or whatever complex signal, containing only frequencies less than 20khz for 44.1khz sampling rate) ..... where can this signal occur in time. 2) Says, how quickly can "a signal jump up and down". (ie. how high frequency can it be) Meridian do NOT claim than 1) cannot be achieved with 44.1 (or whatever) sampling rate. Indeed, they actually link to references from their patent which demonstrate it CAN BE .... and that says that the time resolution of PCM is essentially infinite. ... and so, Meridian mean #2. They mean that a signal must be able to change faster than 20khz (for 44.1k sampling rate for example). That is they mean to say that we need to have the high frequency content. The key claim of Sorry if you already understand what I just said ...... others might not.... and it can get very confusing if what is being discussed is not exactly defined.
a.dent Posted January 22, 2016 Posted January 22, 2016 So according to that Yamaha research (and I'm obviously paraphrasing here) listening to 96/192 is only going to be an advantage in a controlled studio environment and sitting in the sweet spot. With "normal" speakers that can't react in 6us, 48kHz sample rate is all that is necessary.
davewantsmoore Posted January 22, 2016 Posted January 22, 2016 http://www.yamahaproaudio.com/global/en/training_support/selftraining/audio_quality/chapter4/02_audio_universe/ In a nutshell it goes like this. Whilst human can't generally hear audio frequencies above 21kHz, this is only true for continuous audio signals, e.g. like a laboratory test tone. The kicker is that most audio signals are not continuous. The white paper went on to explain that our hearing actually has a theoretical upper limit of hearing 18 MHz, translating to a temporal resolution of 0.055 microseconds! So what is the real world limit? Scientifically controlled blind testing by Dr Kunchur show that we can indeed hear down to 6us or even lower. The Yamaha page, shows that moving one channel of a stereo pair, very slightly towards the listener was audible..... and concludes that the signals between the left and right speaker, should be able to be aligned with each other, down to <insert very small delay>. PCM (even 44.1khz) can already do this. You can have a 7khz sinewave (at 44.1khz sampling rate). In the left channel it can start now ... and in the right channel is can start 1 microsecond (or whatever) later. Their statement that: Note that the reciprocal of 6 microseconds is 166kHz - indicating that an audio system should be able to process this frequency to satisfy this timing perception Is based on the same misunderstanding discussed previously. In the way they are talking about, PCM has infinite time resolution. The audio system only need to be able to "process" 166kHz frequencies, if you want to reproduce those frequencies. For a correctly band-limited signal (eg. 0-20Khz for 44.1kss), the is no "time resolution" limit.
davewantsmoore Posted January 22, 2016 Posted January 22, 2016 (edited) With "normal" speakers that can't react in 6us, It depend on what you mean "react to" 6us. If we have a signal.... which contains frequencies from 0 to 20khz ..... the signal can start now ... or it can start in 1 microsecond (or even less) from now .... or 7.... or whatever. The time resolution of the speaker (and of PCM) .... is not limited. In the experiment from the yamaha page. They made the signal in one speaker start 6us (or whatever it was) after the right. They did it by moving the speaker. They could have delayed the signal. Both the speaker (and PCM) can do that. If on the other hand you mean that the signal started (now) .... and the delay between now and when the signal can change next is 6us. Then this is referring to the frequency of the signal. A signal which changes after 6us... has a frequency of 166khz. This change in a signal is not audible. Edited January 22, 2016 by davewantsmoore
davewantsmoore Posted January 22, 2016 Posted January 22, 2016 Well Yamaha claims that in a professional studio Where did they talk about "studio" being "needed"? Adding a very small delay between two signals (eg. between the right and left speaker) ... produce audible effects, this is very well known, and is just the physics of waves. In that regard, I would expect the "time resolution" we can be sensitive to, is essentially infinitely small. .... but this doesn't call for higher sampling rates, than double the frequency we want to represent.
hochopeper Posted January 22, 2016 Posted January 22, 2016 Replace Kunchur below with whatever other group (MQA) are using his work or similar as a justification for their marketing spiel. Kunchur's paper has been identified as being incorrect regarding the temporal resolution of redbook audio. Kunchur's experimental findings on temporal resolution of human hearing are correct. Redbook audio resolution is just fine to meet the requirements of human hearing as identified by Kunchur's study. We (audiophiles) should be happy we've already got all we need. The people who make our DACs may want to optimise their filter and need to adjust the sample rate up to apply that in a way that provides optimal performance for their hardware. We don't need a higher sample rate file to be able to perform that task, it is actually better to leave it to the DAC designer as part of their design. 3
davewantsmoore Posted January 22, 2016 Posted January 22, 2016 Redbook audio resolution is just fine to meet the requirements of human hearing as identified by Kunchur's study. We (audiophiles) should be happy we've already got all we need. I would like this that is how I would sound if I wasn't trying to "explain it". Perhaps a touch delusional that I could be so lucid.
hochopeper Posted January 22, 2016 Posted January 22, 2016 I would like this that is how I would sound if I wasn't trying to "explain it". Perhaps a touch delusional that I could be so lucid. Just trying to extract out a simple take home message for the punters 1
Guest rmpfyf Posted January 22, 2016 Posted January 22, 2016 Again... It's important to understand differences between: Psychoacoustics vs neuroscience Continuous audio vs attack Intended system response vs amplitude response possible Recording vs playback considerations DAC processes vs discrete sampling Message for the punters? Max out your Redbook, then let's talk
Guest Eggcup The Daft Posted January 22, 2016 Posted January 22, 2016 In a nutshell it goes like this. Whilst human can't generally hear audio frequencies above 21kHz, this is only true for continuous audio signals, e.g. like a laboratory test tone. The kicker is that most audio signals are not continuous. The white paper went on to explain that our hearing actually has a theoretical upper limit of hearing 18 MHz, translating to a temporal resolution of 0.055 microseconds! I should probably stay a million miles away from this thread... but it seems to me that the 18MHz is the theoretical limit only if white noise is used (to get all of the hairs excited), and would be the bandwidth of the aural system, nothing to do with the limit of hearing. It would also be dependent on the nerve being able to transmit what the hairs are doing coherently. I also note that all of these experiments are dependent on square waves being analogous to music, and the movement of the speaker being analogous to a change in a recorded signal. I'm left wondering if what is being detected is a byproduct of the rise time of the square wave in the generating system being used, which is why later experiments often report shorter time differences than earlier ones using the same method.... possibly moving the speaker further would result in the difference becoming undetectable again, if that was the case?
davewantsmoore Posted January 22, 2016 Posted January 22, 2016 18MHz is the theoretical limit It may well be infinite, but redbook audio already provides that. 1
Guest Eggcup The Daft Posted January 22, 2016 Posted January 22, 2016 It may well be infinite, but redbook audio already provides that. I really should learn to make myself clearer. That 18Mhz limit, from the Yamaha article: But the cochlear nerve string includes as much as 30,000 afferent neurons, their combined firing rate theoretically could reach up to 18 MHz - with a corresponding theoretical time/phase detection threshold of 0.055 microseconds. It's the internal transmission system from the ear to the brain that has that 18MHz limit. I can't see how they can justifiably calculate a detection threshold from that particular number. You'd have to simultaneously stimuate each neuron for a start. Hence my comment about white noise (as you need a lot of different frequencies to have a chance of doing that).
LHC Posted January 22, 2016 Posted January 22, 2016 (edited) So according to that Yamaha research (and I'm obviously paraphrasing here) listening to 96/192 is only going to be an advantage in a controlled studio environment and sitting in the sweet spot. With "normal" speakers that can't react in 6us, 48kHz sample rate is all that is necessary. Not quite correct. 48kHz sample rate could at best resolve down to 20us, a far cry from the experimentally determined human limit of 6us. To achieve that 6us limit 192kHz sampling is required. What Yamaha is saying (paraphrasing) is that this limit may be achievable in a studio environment. Outside of the studio, implementation and real world issues would mean we should aim for a more realistic threshold of 10us which they still classified as very high quality. The 10us resolution corresponds to a sampling rate of 96kHz. Edited January 22, 2016 by LHC
a.dent Posted January 22, 2016 Posted January 22, 2016 Not quite correct. 48kHz sample rate could at best resolve down to 20us, a far cry from the experimentally determined human limit of 6us. To achieve that 6us limit 192kHz sampling is required. What Yamaha is saying (paraphrasing) is that this limit may be achievable in a studio environment. Outside of the studio, implementation and real world issues would mean we should aim for a more realistic threshold of 10us which they still classified as very high quality. The 10us resolution corresponds to a sampling rate of 96kHz. But isn't it pointless having a sample rate that high if the speakers can't react that quickly?
LHC Posted January 22, 2016 Posted January 22, 2016 (edited) But isn't it pointless having a sample rate that high if the speakers can't react that quickly? That is a valid point, an individual's mileage will depend on their system's resolving capability. Some speakers may be more capable than others. Certainly there are well respected audiophiles who claim they can hear the differences with high resolution audio (and there are audiophiles who can't). FMMV. Edited January 22, 2016 by LHC
Volunteer sir sanders zingmore Posted January 22, 2016 Volunteer Posted January 22, 2016 (edited) Now I'm confused. I thought following "davewantsmoore" response, the 5kHz wave would lose the 28hHz modulation and return to a sine wave. Have I got that wrong? I am however aware that it is the lower harmonics that provide the distinctive sound of an instrument. The relative 'volume' of the harmonics create the distinction. I guess I'm open to the idea that even these tiny upper harmonics could possibly change the perception of a sound. Wouldn't any modulation of the 5kHz wave already be captured in the recording? Edited January 22, 2016 by Sir Sanders Zingmore 1
LHC Posted January 22, 2016 Posted January 22, 2016 What's "off topic" about my question? I was hoping someone in the know would clarify it for me. Sorry, my comment wasn't targeting you specifically. I left this thread for a while and just returning my attention to it after reading those Yamaha white papers. Your posts talked about hearing the differences between violins. My understanding is that 'voice' of a violin, i.e. its timbre, is determined by its frequency distribution. Only 2-4% of the violin's power output is above 20kHz, so the bulk of the frequencies that characterise the violin's timbre is within the audio band of 20kHz. So here I do agree with Newman for this particular case when he said it is the content within the 20kHz that mattered. My aim was to pivot the discussion away from audible frequency band to temporal resolution where our hearing seems to have a different threshold. It is not about reproducing timbre, but about resolving the individual signals/notes.
LHC Posted January 22, 2016 Posted January 22, 2016 (edited) Again.... the specifics are quite important. Time resolution might mean: 1) A signal (independent of it's frequency) can start and stop at a certain time 2) A signal has a period of (short than) X length 1) Says, for example... that you can have a 10khz signal (or whatever complex signal, containing only frequencies less than 20khz for 44.1khz sampling rate) ..... where can this signal occur in time. 2) Says, how quickly can "a signal jump up and down". (ie. how high frequency can it be) Meridian do NOT claim than 1) cannot be achieved with 44.1 (or whatever) sampling rate. Indeed, they actually link to references from their patent which demonstrate it CAN BE .... and that says that the time resolution of PCM is essentially infinite. ... and so, Meridian mean #2. They mean that a signal must be able to change faster than 20khz (for 44.1k sampling rate for example). That is they mean to say that we need to have the high frequency content. Sorry if you already understand what I just said ...... others might not.... and it can get very confusing if what is being discussed is not exactly defined. Thank you for your post. It does appear that the definition of 'resolution' can cause confusion. So maybe 'signal resolution' may be a less ambiguous term. It is very interesting to follow this up with what Professor Kunchur wrote in his FAQ page (http://boson.physics.sc.edu/~kunchur/papers/FAQs.pdf) "Temporal resolution and digital signals In most fields of science, "to resolve" means to "substantially preserve the essence of the original signal" and in particular to be preserve enough information in the signal so that it can "become separated or reduced to constituents" (e.g., please see v.tr. [11] and v.intr. [2] under http://www.thefreedictionary.com/resolve). If the constituents cannot be separated and have merged together, the signal's essence has been killed. However, a certain other definition exists which pertains to the smallest time shift that produces a difference in the final digital code; this resolution allows noticing differences in the "degrees of death" of the killed signal rather than the system’s ability to preserve sonic details and convey them to the ear. In psychoacoustics and auditory neurophysiology, the former definition applies. Below I give optical and audio examples to explain this further. Optical example: A binary star system is imaged through a telescope with a CCD. First, there is the analog optical resolution that is available, which depends on the objective diameter, the figure (optical correctness) of the optics, and seeing (atmospheric steadiness). This optical resolution is analogous to the "analog bandwidth". Because this resolution is limited, a point source becomes spread out into a fuzzy spot with an intensity profile governed by the point spread function or (PSF). Next we are concerned with the density of pixels in the CCD. To avoid aliasing, the pixel spacing L must be finer than the optical resolution so that the optics provides "low pass filtering". If the pixels and their separation are larger than the separation of the centers of the two star images, the two stars will not be resolved separately and will appear as a single larger merged spot. In this case the essential feature (the fact that there are two separate stars and not an oblong nebula) has been destroyed. This is usually what is meant by "resolution" or the lack of it. The number of bits N that can differentiate shades of intensity ("vertical resolution") has little to do with this – no number of vertical bits can undo the damage. However, details of the fuzzy merger do indeed depend on N: if the star images are moved closer together, the digital data of the sampled image will be different as long as the image shift exceeds L/N. This L/N definition of resolution applies to the coding itself and not to the system's ability to resolve essential features in the signal as described above (otherwise, the average 6" backyard telescope with a 12 bit CCD would have a resolution that is < 0.001 arc seconds, which is better than the ~0.1 arc seconds resolution of the research grade telescopes!). Digital audio recording: In my papers, statements related to "consumer audio" refer to CD quality, i.e., 16 bits of vertical resolution and a 44.1 kHz sampling rate (when the work for these papers was begun around 2003, 24bit/96kHz and other fancier formats were not in common use in people's homes for music reproduction). For CD, the sampling period is 1/44100 ~ 23 microseconds and the Nyquist frequency fN for this is 22.05 kHz. Frequencies above fN must be removed by anti-alias/low-pass filtering to avoid aliasing. While oversampling and other techniques may be used at one stage or another, the final 44.1 kHz sampled digital data should have no content above fN. If there are two sharp peaks in sound pressure separated by 5 microseconds (which was the threshold upper bound determined in our experiments), they will merge together and the essential feature (the presence of two distinct peaks rather than one blurry blob) is destroyed. There is no ambiguity about this and no number of vertical bits or DSP can fix this. Hence the temporal resolution of the CD is inadequate for delivering the essence of the acoustic signal (2 distinct peaks). However this lack of temporal resolution regarding the acoustic signal transmission should not be confused with the coding resolution of the digitizer, which is given by 23 microseconds/2^16 = 346 picoseconds. This latter quantity has no direct bearing on the system's ability to separate and keep distinct two nearby peaks and hence to preserve the details of musical sounds. Now the CD's lack of temporal resolution for complete fidelity is not systemic of the digital format in general: the problem is relaxed as one goes to higher sampling rates and by the time one gets to 192 kHz, the bandwidth and the ability to reproduce fine temporal details is likely to be adequate. I use the word "likely" rather state definitely for two reasons. In our research we found human temporal resolution to be ~5 microseconds. This is an upper bound: i.e., with even better equipment, younger subjects, more sensitive psychophysical testing protocols, etc., one might find a lower value. The second reason to not give an unambiguous green signal to a particular sampling rate is that the effective bandwidth that can be recorded is less than the Nyquist frequency because of the properties of the anti-aliasing filter, which is never perfect in real life. One more thing I want to add is that one forum poster inquired whether the blurring is an analog effect and not a digital one (“… this isn't a samplingrate issue, it's a simple question of linear filtering…"). But the two are not separate. While it is true that the smearing may take place in the analog low-pass filter circuitry before the signal reaches the ADC, the low-pass filter cutoff is dictated directly by the sampling rate. The exact amount of smearing and other errors will depend on the slope and other details of the filter, but the big-picture conclusion is still the same." EDIT: the underlining was done by me for emphasis Edited January 22, 2016 by LHC 2
Bilbo Posted January 22, 2016 Posted January 22, 2016 We are on the same page here. I agree that the bulk of timbal information is below 20kHz but there is a minor amount that exists above that threshold. I asked what appears to be a dumb ass question regarding what a HF modulated 5kHz sine wave looks like after ADA conversion. I still don't think I've got a conclusive answer yet. My point is this. Are we capable of "hearing" differences in a sound when those HF frequencies are removed even though on their own we cannot detect them (say 28kHz). Are our temporal capabilities greater than just our frequency response? If so then logically 192 kHz does matter. But then this is "audio"!
Bilbo Posted January 22, 2016 Posted January 22, 2016 There you go - I'm with Prof Kunchur - 192kHz does matter! At least that gives me a plausible reason to back why I think the high def files sound better. At least on my system anyway. ☺ï¸
Guest Eggcup The Daft Posted January 22, 2016 Posted January 22, 2016 http://www.stereo.net.au/forums/index.php/topic/85863-realistic-frequency-response-in-room/?p=1408137 From the link, you can find and view the powerpoint presentation Slides from the AES convention in Banff on intermodulation distortion in loudspeakers and its relationship to "high definition" audio. It seems that even if the system handles high frequency information, you'll have a hard time getting to hear it. And it appears to me that this work supports davewantsmoore on resolution and perception with standard definition audio. As an aside, the more I think about it, the more I mistrust "moving the loudspeaker" as an acceptable substitute for "proper" tests that record and digitise the sound difference we are trying to perceive. Can anyone guarantee that the loudspeaker moving tests are giving the right result?
a.dent Posted January 22, 2016 Posted January 22, 2016 Thank you for your post. It does appear that the definition of 'resolution' can cause confusion. So maybe 'signal resolution' may be a less ambiguous term. It is very interesting to follow this up with what Professor Kunchur wrote in his FAQ page (http://boson.physics.sc.edu/~kunchur/papers/FAQs.pdf) "Temporal resolution and digital signals In most fields of science, "to resolve" means to "substantially preserve the essence of the original signal" and in particular to be preserve enough information in the signal so that it can "become separated or reduced to constituents" (e.g., please see v.tr. [11] and v.intr. [2] under http://www.thefreedictionary.com/resolve). If the constituents cannot be separated and have merged together, the signal's essence has been killed. However, a certain other definition exists which pertains to the smallest time shift that produces a difference in the final digital code; this resolution allows noticing differences in the "degrees of death" of the killed signal rather than the system’s ability to preserve sonic details and convey them to the ear. In psychoacoustics and auditory neurophysiology, the former definition applies. Below I give optical and audio examples to explain this further. Optical example: A binary star system is imaged through a telescope with a CCD. First, there is the analog optical resolution that is available, which depends on the objective diameter, the figure (optical correctness) of the optics, and seeing (atmospheric steadiness). This optical resolution is analogous to the "analog bandwidth". Because this resolution is limited, a point source becomes spread out into a fuzzy spot with an intensity profile governed by the point spread function or (PSF). Next we are concerned with the density of pixels in the CCD. To avoid aliasing, the pixel spacing L must be finer than the optical resolution so that the optics provides "low pass filtering". If the pixels and their separation are larger than the separation of the centers of the two star images, the two stars will not be resolved separately and will appear as a single larger merged spot. In this case the essential feature (the fact that there are two separate stars and not an oblong nebula) has been destroyed. This is usually what is meant by "resolution" or the lack of it. The number of bits N that can differentiate shades of intensity ("vertical resolution") has little to do with this – no number of vertical bits can undo the damage. However, details of the fuzzy merger do indeed depend on N: if the star images are moved closer together, the digital data of the sampled image will be different as long as the image shift exceeds L/N. This L/N definition of resolution applies to the coding itself and not to the system's ability to resolve essential features in the signal as described above (otherwise, the average 6" backyard telescope with a 12 bit CCD would have a resolution that is < 0.001 arc seconds, which is better than the ~0.1 arc seconds resolution of the research grade telescopes!). Digital audio recording: In my papers, statements related to "consumer audio" refer to CD quality, i.e., 16 bits of vertical resolution and a 44.1 kHz sampling rate (when the work for these papers was begun around 2003, 24bit/96kHz and other fancier formats were not in common use in people's homes for music reproduction). For CD, the sampling period is 1/44100 ~ 23 microseconds and the Nyquist frequency fN for this is 22.05 kHz. Frequencies above fN must be removed by anti-alias/low-pass filtering to avoid aliasing. While oversampling and other techniques may be used at one stage or another, the final 44.1 kHz sampled digital data should have no content above fN. If there are two sharp peaks in sound pressure separated by 5 microseconds (which was the threshold upper bound determined in our experiments), they will merge together and the essential feature (the presence of two distinct peaks rather than one blurry blob) is destroyed. There is no ambiguity about this and no number of vertical bits or DSP can fix this. Hence the temporal resolution of the CD is inadequate for delivering the essence of the acoustic signal (2 distinct peaks). However this lack of temporal resolution regarding the acoustic signal transmission should not be confused with the coding resolution of the digitizer, which is given by 23 microseconds/2^16 = 346 picoseconds. This latter quantity has no direct bearing on the system's ability to separate and keep distinct two nearby peaks and hence to preserve the details of musical sounds. Now the CD's lack of temporal resolution for complete fidelity is not systemic of the digital format in general: the problem is relaxed as one goes to higher sampling rates and by the time one gets to 192 kHz, the bandwidth and the ability to reproduce fine temporal details is likely to be adequate. I use the word "likely" rather state definitely for two reasons. In our research we found human temporal resolution to be ~5 microseconds. This is an upper bound: i.e., with even better equipment, younger subjects, more sensitive psychophysical testing protocols, etc., one might find a lower value. The second reason to not give an unambiguous green signal to a particular sampling rate is that the effective bandwidth that can be recorded is less than the Nyquist frequency because of the properties of the anti-aliasing filter, which is never perfect in real life. One more thing I want to add is that one forum poster inquired whether the blurring is an analog effect and not a digital one (“… this isn't a samplingrate issue, it's a simple question of linear filtering…"). But the two are not separate. While it is true that the smearing may take place in the analog low-pass filter circuitry before the signal reaches the ADC, the low-pass filter cutoff is dictated directly by the sampling rate. The exact amount of smearing and other errors will depend on the slope and other details of the filter, but the big-picture conclusion is still the same." EDIT: the underlining was done by me for emphasis This all seems logical and sounds believable. The only problem is, are microphones capable of recording to 96kHz? If they can not be relied upon to record accurately at that frequency what is the value of 24/192 recordings? We would just be introducing a different type of distortion to the recording.
Guest rmpfyf Posted January 22, 2016 Posted January 22, 2016 "Hence the temporal resolution of the CD is inadequate for delivering the essence of the acoustic signal (2 distinct peaks)." Nope... the 'essence of the acoustic signal', where music is concerned, is not the ability to resolve discrete peaks at 5us difference. Read the experiments ascertained in context. Then get to a concert and go work out when and where this actually occurs. "We are on the same page here. I agree that the bulk of timbal information is below 20kHz but there is a minor amount that exists above that threshold" There is no doubt that spectral content exists above 20kHz, any solid recording can show as much. Unless you're recording a hard shot on a whistle for a dog, high frequency content remains typically low energy. This still doesn't mean you can hear any frequency content above ~20kHz; we have no psychoacoustic response. This isn't the same a phenomenon as 'attack', the ability to hear and characterize timing. For which, if you needed a more resolute timebase, you'd certainly not need it all the time (which is where MQA starts - time resolution where you need it only). It's a limited application - we cannot hear a 50kHz waveform even if such a thing could be played back on a 192kHz system. Unless your system is mega efficient and of an appropriate amp design (not a Class D), I'd doubt you could recreate the difference. For the vast majority of systems, what we're hearing is more relaxed filtering at the top end, less group delay effect, etc.
huxmut Posted January 22, 2016 Posted January 22, 2016 I tend to move when listening to music. I nod my head in time to the beat. Surely any of this special, only available in 192kHz, timing information is lost at this point. And yet this head movement occurs just when the music sounds the best. Just an observation 1
Recommended Posts