Luminance and chroma sampling
CRT over scan
16 : 9
Before attempting to explain pixel aspect ratio, the origins of analogue TV standards need to be explained. The reason for this is simple: the digital standards were designed to be compatible with the older analogue standards. That is, they were designed such that it is possible and indeed simple to digitise analogue recordings. If they had have been designed completely independently of the old analogue standards, this might not have been possible, or have been extremely complex if it were. Unfortunately, this put the digital standards in a strait jacket as they were constrained by some old and sometimes arcane analogue standards!
Firstly, let’s look at the analogue video standards.
The two main analogue video transmission standards are:
There are others, but they are not dealt with in this site.
Each standard is based on the principle of transmitting the video and audio on a carrier wave at different frequencies. The carrier wave is received by the receiver, the sound is separated from the video and the latter is sent to the cathode ray tube (assuming a CRT display). The audio signal is sent to an amplifier and comes out as sound.
In the early days of black and white TV, the video signal only consisted of luminance information. That is, how bright of dark should we be at any moment in time. Of course the luminance value changed rapidly with respect to time.
How does the CRT display the signal on a black and white television? Basically it sends a beam rapidly across the screen from left to right (as you look at it) in a straight line. It then switches off (this is called “horizontal blanking”), returns the beam back to the left of the screen, moves down a tiny bit, switches back on and does the same again. If it is done fast enough, it is possible to fill the whole screen (called a “frame”) in a fraction of a second. When the beam gets to the bottom of the screen, it switches off (this is called “vertical blanking”), moves back to the top, switches back on and starts again.
DIAGRAM OF SIMPLE SCANNING
Clearly, this begs the questions of how many lines there are on the screen and how fast the beam actually moves. The answer to this is where the differences between NTSC and PAL become apparent. PAL is much easier to explain, to let’s start with NTSC!
In the early days of NTSC, it was decided that a whole screen should be drawn at exactly half the local AC mains frequency and that there should be 525 lines. The AC mains frequency in North America was (and still is) 60 Hz. So, a whole screen would be drawn with a frequency of 30 Hz.
The same rationale was used for PAL (which came after NTSC) that the whole screen should be drawn at exactly half the local AC mains frequency but that there should be 625 lines, not 525. The AC mains frequency in Europe was (and still is) 50 Hz. So, a whole screen would be drawn with a frequency of 25 Hz.
The mains frequency was chosen as it was a handy timing signal that already existed and so it was easy for the television manufacturers to use this signal. In addition, using the mains frequency made it easier to avoid any interference signals with the mains itself. Why half the AC frequency and not the whole AC frequency? To answer this, we first need to explain the term “interlacing”. (The explanation starts using a simple model and then homes in on what actually happens. As a result, the early descriptions are deliberately over simplistic but become more realistic as the explanation moves on.)
Instead of drawing the top line on the screen, then the second, then the third and so on to the 525th line (625th line with PAL), the CRT draws the first line, then the third, then the fifth and so on down to the 525th line (625th line with PAL). (Later, when we get onto blanking, we discover that in fact the first few lines aren’t actually drawn at all.) It then blanks vertically, moves back to the top and draws the second line, then the forth, then the sixth and so on to the 524th (624th with PAL) line. It then blanks back to the top and starts at the first line again. The consequence of this is that in effect, the CRT draws half the screen in one scan and then goes back to the top and fills in the gaps. Each half-screen is called a “field” and this method of drawing is called “interlacing”. Each frame is therefore made up of two fields.
DIAGRAM OF A TWO-FIELD PICTURE MADE INTO A FRAME
It is now possible to answer the question “Why half the AC frequency and not the whole AC frequency?” Because the CRT is effectively moving from top to bottom every field, not every frame, and it does this 60 950 with PAL) times a second, which matches the AC frequency perfectly.
Why interlace and not just draw all lines on one hit? Mainly to reduce the flicker that 30 frames per second would suffer from if drawn in one hit. Interlacing reduces this. As technology progressed, it became possible to draw each line in turn without interlacing and without flicker (usually by doubling the frame rate). This is called “progressing scan”, but it is not covered further in this site.
Notice that I used the terms “first line”, “fifth line” and so on instead of “line 1”, “line 5”? This is because the lines are numbered in the order that they are drawn in, NOT the order that they appear on the screen in. So in fact, the CRT draws lines 1 to 525 (625 with PAL) in sequence. Also note that the line numbering does not necessarily start from 1. It is worth remembering that line scanning is just a series of electrical signals: there’s nothing intrinsic in the analogue specifications that actually numbers the lines. Numbering is just an artificial construct for our convenience and it only really became important when we needed to count the lines for the proposes of converting them to a digital signal. One of the called “ITU-R BT.470” (http://www.itu.int/rec/recommendation.asp?type=folders&lang=e&parent=R-REC-BT.470 ) lists the line numbering as follows.
So in fact, NTSC doesn’t start at line 1 anyway! Also, remember that although this numbering scheme implies that the fields are not the same size, this is the digital world’s perspective of line numbering. In practice in the old black and white analogue world, each field was the same size:
The half lines correctly imply that line drawing starts halfway through one of the lines. This is not only possible, but also makes sense when it is considered that the lines were not exactly horizontal: they are slightly tilted downhill from left to right and so one of the lines starts half way across the screen.
DIAGRAM SHOWING THIS
More importantly than worrying about where one field starts and the next stops or actual line numbers is to ask if all the 525 lines (625 for PAL) used for video and audio signals? To answer this we have to look at the horizontal and vertical blanking in more detail.
Firstly we need to define the “horizontal scan frequency”. This is the frequency with which the CRT beam goes from left to right and back again horizontally. (We already know the frequency that it does this vertically: 60 Hz for NTSC and 50 Hz for PAL.) We can calculate this for both NTSC and PAL easily. For black and white NTSC, the CRT paints the whole screen 30 times a second and each screen consists of 525 lines. So, every second, it draws and moves the beam back to the left 30 x 525 times a second = 15750 Hz. For PAL, the CRT paints the whole screen 25 times a second and each screen consists of 625 lines, so every second, it draws and moves the beam back to the left 25 x 625 times a second = 15625 Hz. Note that the horizontal scan frequency is a consequence of the specification of having 525 lines (625 for PAL). That is, the specification came first and the horizontal scan frequency was simply a mathematical consequence of this. Taking one over the frequency, we can also calculate the time taken for each line to be drawn. For NTSC, it is 1 / 15750 = approximately 63.492 ns. For PAL, it is 1 / 15625 = 64 ns.
DIAGRAM SUMMARISING THESE VALUES FOR NTSC AND PAL
So, we could simply tell the CRT “keep drawing lines 15750 times a second displaying whatever signal happens to come along as you’re doing it”. Seeing as we have defined in our standards that we paint 525 lines 30 times a second, one might think that this was a good idea. Unfortunately, electronics components come with a “tolerance”. This is a statement of how accurate the components are. For example, a resistor may be rated at 200 ohms. However, this does not mean that it is exactly 200 ohms, only that it is guaranteed to be 200 ohms plus or minus some small amount, for example 5%. (That is, it is between 190 and 210 ohms.) Similarly, we can’t guarantee that the CRT beam moves at exactly 15750 Hz, only that it is very very close to this. Now suppose that we let the CRT draw lines at what we think of as 15750 Hz, but which is in fact only 15749.999999 Hz, for example due to the tolerance of the electronic components. The picture will be skewed and will appear more skewed across the screen with time. So, to account for this, synchronisation pulses are used. A synchronisation pulse is a “tick” pulse (like a metronome) that is part of the transmission. In simple terms, this “synchronisation pulse” tells the CRT when it can start to draw a new line.
There is also another kind of pulse that tells the CRT to stop drawing. This is, not surprisingly, called the “blanking pulse”. In practice, the synchronisation pulses and blanking pulses are separate but both form part of the video signal, though this is not important. The important point here is that as well as the actual picture signal, there are at least two other signals:
On top of this, there is a less-frequent vertical synchronisation pulse and a vertical blanking pulse. Seeing as both the horizontal blanking pulse and vertical blanking pulse tell the CRT to stop drawing, they share the same electrical properties.
So, if we were to convert the video transmission into words, it might go something like:
The important thing about the horizontal tick pulses is that they carry on even when a vertical blanking signal is being sent out. One way to think of this is like a drummer playing to a click track on a metronome. Even when the drummer isn’t playing, the click track continues ensuring that the drummer next comes in at the right place. The NTSC standard (and the PAL standard) take this into account and so what they really specify is the number of horizontal synchronisation pulse per frame to be 525 (625 for PAL), not the number of visible lines to be drawn on the television screen. Some of the synchronisations (or “lines” as they are still called even though we now know that lines aren’t necessarily drawn) are “wasted” on vertical blanking and indeed other control signals, which means that not all the lines are used for actual video image. The period of time that the CRT is not drawing active picture is called the “vertical blanking period” in the case of moving back to the top, even though blanking isn’t the only thing going on in these intervals.
So how many lines are used for picture? The simple answer is as follows:
This is the “simple answer” because the half files do change from field to field. In fact, NTSC uses a four-field sequence during which time the actual frame numbers that start and end the active picture varies slightly from field to field. To make it even more complex, PAL uses an 8-field sequence. Therefore, it is best not to get bogged down with exactly which lines are active and which are not and just stick with the averages:
This was a very long explanation of vertical blanking! However, we now need to look at horizontal blanking. Whilst it is easy to nominate a certain number of lines for blanking and other control signals, doing the same for horizontal blanking is somewhat arbitrary. Therefore, it is left to the standards committees to decide how much time should be allowed to get the beam back to the left hand side of the screen (and other control signals) before drawing the next line. For PAL, this was defined as 12 us for blanking and 52 us for picture, which makes up the required total of 64 us for each line. For NTSC (in the days of black and white) this is defined as anything between 10.2 us and 11.4 us for blanking and the remainder for picture.
Let’s summarise what we have thus far.
DIAGRAM INCLUDING ALL VALUES (STANDARDS AND CONSEQUENT) THUS FAR
Unfortunately, with the advent of colour, the NTSC the horizontal scan frequency 15750 Hz had a problem. (PAL did not run into this problem. It has been stated on some sites that PAL, which came after NTSC, was designed such that it deliberately did not run into this problem.)
The challenge to the designers of the standard for transmitting a colour signal was how to do it in such a way that when a colour transmission was sent to a black and white television, it would display in black and white (as opposed to not displaying anything or displaying a scrambled picture) without interference. To do this, they had to use the same carrier frequency and leave the luminance and audio signals untouched. The new colour part of the signal, called the “chroma signal”, somehow had to be added to this signal without it interfering with either the luminance or the audio signals. In the first attempts, they tried a “suck it and see” approach of simply selecting a frequency within the available bandwidth, squeezing the chroma signal in and seeing what happened. Unfortunately, it became evident that it was not possible to do this because the chroma signal kept on interfering with the audio signal.
Further study showed the problem to be to do with the horizontal scan frequency of 15750 Hz. It was calculated that if the chroma sub-frequency were an odd multiple of half the horizontal scan frequency, then interference would be greatly reduced. (The reasoning for this is outside the scope of this simple explanation!) After much experimentation and number-crunching with frame rates, horizontal scan frequencies and so on, it was concluded that the ideal was a horizontal scan frequency of 4500000 / 286, roughly 15734.265734265734… Hz and achroma sub-frequency of roughly 3.579545579545… MHz would do the trick!
DIAGRAM SHOWING HOW THIS IS DERIVED: 4.5 X 10^6 * 2 / (455 + 117) AND
4.5 * 455 / (455 + 117)
If the horizontal scan frequency were set to this, the chroma signal could be squeezed into the existing bandwidth with almost no noticeable effects.
As 15734.265734265734… Hz is close to the original 15750 Hz, any difference should not be noticeable. Let’s see if this is the case: The number of lines had to be kept the same at 525, but now were only drawing 15734.265734265734… lines per second, so we’re only drawing (4500000 / 286) / 525 = 30000 / 1001 = 29.970029970029… frames per second. Originally, it was 30 frames per second, so the difference is tiny. Unfortunately, it is also significant and it is explained further below! The television manufacturers weren’t entirely happy with the new frame rate because it meant that the AC frequency could no longer be used as a clock signal. Therefore, they had to add new clocks to the televisions that “ticked at” 30000 / 1001 Hz and use this as a timing signal instead.
It is often said that the NTSC frame rate is exactly 29.97. This, of course, does not equal 29.970029970029… but it is very close. Where does the 29.97 come from and which is right? The simple answer is that 29.970029970029… is right and that 29.97 is an approximation. However, 29.97 is an approximation that is not just a rounding of 29.970029970029… It comes from the addition of time codes to the video.
Suppose I wanted to give a video to a colleague so that he could add a sound effect. I might say: “Start the sound effect at the bit where Arnie blasts the T1000 to smithereens”. This is not, however, the most accurate way to represent the point at which an event should happen! Therefore a system of numbering each frame was invented. The simplest way would be to start the counting at 0 (or some other integer) and simply count each frame one by one. So, I might say “start the sound effect at frame 16589483632”. If you have this information, along with the rate at which frames pass, you could add a sound effect without even needing the original video. However, in practice a different system is used called SMPTE. SMTPE simply counts hours, minutes, seconds and the frame number (starting from 0) in that second. So the first frame is 00:00:00:00, the thirtieth frame is 00:00:00:29. Bearing in mind that in black and white NTSC, the next frame would start in the next second, the thirty-first frame would be 00:00:01:00, the thirty-second 00:00:01:01 and so on. This tying-in of the frame counter with a real-time clock has its advantages in that it is now possible to think of the frame count as a time line (which indeed it is). We easily know, for example, that the third frame in the eleventh minute is 00:11:00:02. Had we simply counted the frames from 0, we’d have to work this out with a calculator! SMPTE for PAL is the same except that because there are only 25 frames per second, the last number in the time code only goes up to 24.
What about SMPTE time codes and colour NTSC? This is where the problem is. Suppose we have an NTSC colour video that lasts exactly one hour and which has a frame rate of 29.970029970029… Now suppose we use the same SMPTE time coding that we used for black and white. Firstly, how many frames do we have? 29.970029970029… x 60 * 60 = 107892 (rounding down to the nearest integer) frames. Now let’s add the SMPTE time codes to these frames. We start at 00:00:00:00 and end with the last frame having a code of 00:59:56:11 whereas we’d like it to have a time code of 00:59:59:29. This might not be a problem if we told a sound engineer “start the sound effect at time code 00:12:13:14”, but the engineer might be doing the sound effects without access to the video. Instead, they may be using their own accurate time source with the intention of “sticking the two together” at the end. (This is often how sound is added to video.) So if the engineer were to start the sound at time code 00:12:13:14 (using his time source), it would not be in the right place when the sound was added to the video. Therefore, a variant of SMPTE was created called “drop-frame time code”. This does not, as the name implies, drop any frames! Instead, it skips certain time codes. In particular, it skips frame numbers 00 and 01, except where the minute is 00, 10, 20, 30, 40 or 50. So, the frame numbering sequence might go:
And so on to:
Taking a 10-minute section of video, we would skip 9 x 2 frame numbers = 18 frame numbers. Over a one hour section of video, we skip 108 frame numbers. These 108 skipped frame numbers bring the 107892 frames up to 108000, which is exactly what we want it to be!
So, if we now apply the drop-frame time code to a sequence of video frames, we get exactly 107892 x 30 / 108000 frames per second = 29.97. Of course, this assumes that the device that is playing back the video is playing it faithfully according to the time code and not with some prior knowledge of the “actual” frame rate. This raises two questions.
The first is easy to answer: we know that with drop-frame SMTPE time code, the frame numbering drifts from real-time but is then pulled back into line by the drop-frame numbers. The most that it is ever out is two frames. Therefore, when the sound is added, it will be at most two frames out or approximately 1/15th of a second. This is deemed acceptable for most purposes.
The second is slightly more interesting because we know that 29.97 is still an approximation to 29.970029970029… If we take a 24 hour video instead of 1 hour, the number of frames is 29.970029970029… x 60 x 60 x 24 = 2589410 (rounding down). However, assuming 29.97 to be the frame rate would give us 29.97 x 60 x 60 x 24 = 2589408 frames. The difference is two frames in 24 hours. Put another way, if we applied drop-frame SMPTE to the 2589410 frames, we’d incorrectly conclude that the hour was up at after 2589408 frames, and the last two would be “lost”, which in the case of most studios is at the outer limit of acceptability. (Most would want to resynchronise the time code with the video after 12 hours, but fortunately, there aren’t many unbroken video sequences that long!) Therefore, 29.97 is deemed “accurate enough” when it comes to time codes, though bear in mind that if we’re simply transmitting a video signal at one end and playing it back at the other, we’re not worried about time codes and both ends will be working at 29.970029970029… anyway, so no problems will be encountered.
Whilst talking about how exact we have to be, it is worth adding that because the equipment used to do all this is not as accurate as our theory, the specification allows the equipment to work within certain “tolerances”. The tolerances are as follows.
Rather usefully, 29.97 and 29.970029970029… both fit within this allowed tolerance anyway, so the differences, from a practical point of view are negligible. This site, however, still assumes an NTSC colour frame rate of precisely 29.970029970029… unless otherwise specified.
Let’s take another look at the effect this has on horizontal blanking and timings. To refresh, PAL is unaffected by the addition of colour and continued to work at a horizontal scan frequency of 15625 Hz and a line time of 64 us and an active picture time of 52 us.
NTSC, however, now works at a horizontal scan frequency of 15734.265734265734… Hz. This equates to a line time of 63.55555555555555… ns. Again, it is left to the standards committees to decide how much time should be allowed to get the beam back to the left hand side of the screen (and other control signals) before drawing the next line. For NTSC (colour) this was defined as 10.9 us for blanking and the remaining 52.65555555555… us for picture.
Let’s summarise all these values so far.
DIAGRAM SHOWING ALL THE VALUES SO FAR
So, we now have an analogue signal that we understand and that we want to turn into digital data. How do we go about it? The first thing that we must be clear about is exactly what we are going to digitise. Every line? All of the line? Only the active picture? What about the half lines? This section answers those questions.
The first place to start is to say that there is no point in digitising non-active picture (blanking and so on), because they control how the CRT tubes operates. However, in the digital domain, there is no CRT to control, so digitising non-active picture would be pointless. So, only the active picture component is digitised. However, as we know some lines are only half long. What happens to these?
To recap the number of analogue lines in each of NTSC and PAL:
Digitising half-lines would be a real headache for those that write standards, so the first thing that was agreed on is that half lines are treated as whole lines. Therefore, from the perspective of sampling, each PAL field has 288 lines and each NTSC field has 243 lines. This makes the field sizes as follows.
It makes huge sense if each line were converted to a horizontal row of pixels, and indeed this is what is done. So, digitised PAL has 576 rows and digitised NTSC has 486 rows. What about the number of columns?
One of the many standards for sampling of analogue video, “ITU-R BT.470”, states that (paraphrased):
Notice that it doesn’t make any preconditions about how many pixels are used horizontally to achieve this, only that when the active sample line is sampled at a predetermined rate, a 4:3 aspect ratio must result.
What about if the sample line is not 52 us for PAL (or 52.65555555555… us for NTSC)? What does the standard have to say about that? Fortunately, the next few sections don’t need to worry about that because it deals exclusively in the theoretical world of sample widths of 52 us for PAL and 52.65555555555… for NTSC. Eventually, however, we will be forced to take this possibility into account. Let’s answer the question briefly for now, however. The consensus seems to be that the width of the display is adjusted on a pro rata basis assuming that the pixel aspect ratio (“PAR”) doesn’t change. (Actually, most sites don’t seem to realise that they are making the assumption about the PAR not changing, but an analysis of the maths behind them shows that this assumption is indeed being made.)
A simple example: if the scan width of a PAL picture is 104 us, then the DAR will be 8:3. More generally,
For now, let’s get back to standard scan widths.
Suppose that as a result of sampling the active PAL picture at some unknown sample rate, we ended up with 1536 pixels. Clearly, if we have a pixel aspect ratio of exactly square (1:1), that is, if each pixel is square, we’ll end up with a DAR of 1536:576, which is certainly not the same as 4:3 and is therefore not ITU-R BT.470-6 compliant. Therefore, we need to define our pixels such that the picture scales to 4:3. If the pixel aspect ratio (“PAR”) is represented as X:Y, we can state that:
(1536 * X) / (576 * Y) = 4 / 3
If we arbitrarily state that Y = 1, we get:
(1536 * X) * 3 = 576 * 4, therefore
X = (576 * 4) / (1536 * 3), therefore
X = 0.5
So, in our fictitious example, our PAR is 0.5:1, or more conveniently written as 1:2. In English, the pixel is twice as high as it is wide. Going back to our example to check that this does indeed give a 4:3 DAR, if we have pixels that are twice as tall as they are wide and 1536 pixels across, then that is the same DAR as 1536 * 1 : 576 * 2 = 4:3. Excellent – we have a system that is compliant with “ITU-R BT.470”! Notice that we defined the PAR, not the standard. The PAR was simply a consequence of the standard – you could say that we fudged our definition of the PAR to make the scan standards-compliant.
Let’s generalise this equation.
If the PAR is X:1, then we can calculate X as follows:
X = (576 * 4) / (P * 3), or more simply
X = 768 / P, where:
P is the number of pixels that represent the active PAL picture if a standard scan width of 52 us has been used.
This equation is for a height of 576 pixels. More generally for a height of H pixels, we can say that:
We’ll use this equation later.
Let’s do the same for NTSC. Suppose, therefore, that as a result of sampling the active NTSC picture, we ended up with 972 pixels. Clearly, if we have a pixel aspect ratio of exactly square (1:1), we’ll end up with a DAR of 972:486, which is certainly not the same as 4:3! Again, we need to define our pixels such that the picture scales to 4:3. If the pixel aspect ratio (“PAR”) is represented as X:Y, we can state that:
(972 * X) / (486 * Y) = 4 / 3
If we arbitrarily state that Y = 1, we get:
(972 * X) * 3 = 486 * 4, therefore
X = (486 * 4) / (972 * 3), therefore
X = 0.666…
So, our PAR is 0.666…:1, or more conveniently written as 2:3. So, if each pixel is one and a half times tall as it is wide, we’ll be compliant with “ITU-R BT.470”.
Let’s generalise this equation.
If the PAR is X:1, then we can calculate X as follows:
X = (486 * 4) / (P * 3), or more simply
X = 648 / P, where:
P is the number of pixels that represent the active NTSC picture if a standard scan width of 52.6555… us has been used.
In its general form:
It’s the same as for PAL, which is to be expected.
Going back to the nasty situation of the scan width not being the standard lengths – what are the PARs then? Easy! Remember that when we were calculating the DAR, we assumed that the PAR stays the same even if the scan width changes – that is, more of the line is sampled and more pixels are output leaving the PAR as it is. So, in fact the PAR is independent of the scan widths. In this case, we can generalise the statements about PAR to the following:
Notice now that we don’t need to make any statements about the total scan width, only the number of pixels in the standard scan width. If you want some “proof” that the total scan widths do indeed cancel each other out in the equation, then this is for you:
We’ll take PAL as an example and start again from first principles. Let’s take a scan width of w, P is the number of horizontal pixels in 52 us of the picture. Let p (small p) be the total number of horizontal pixels on the picture.
p / P = w / 52
DAR = 4w / 156
What is the PAR? It is:
(picture width / number of pixels in width) / (picture height / number of pixels in height)
(picture width / picture height) = DAR. So,
PAR = DAR / (number of pixels in width / number of pixels in height)
PAR = DAR / (p / 576)
Substituting P for p:
PAR = DAR / ((Pw / 52) / 576) = DAR / (Pw / 29952)
PAR = (4w / 156) / (Pw / 29952)
PAR = 119808w / 156Pw
We can cancel out the w (at last!)
PAR = 119808/ 156P
PAR = 768 / P
This proves that the PAR is dependent not on the scan width, but on the number of pixels that are in the standard 52 us of the horizontal line. The same proof can easily be applied to NTSC as well.
“Hold on”, I hear you say, “I’ve seen plenty of sites that show the PAR being related to the scan width”! Let’s this by substituting in little p for big P in out equation. This gives us for PAL:
PAR = 768w / 52p
Ah ha! We have now got the PAR in terms of the scan width, right? Well, yes and no. The number of pixels in the scan width equals the sample rate multiplied by the scan width. Formally,
p = fw, where f is the sample rate
If we substitute this into the PAR equation, we get:
PAR = 768w / 52fw, and again the w cancels out, giving
PAR = 768 / 52f
So again, we can’t really get the PAR to depend on the scan width if we continue to assume that increasing the sample width leaves the PAR alone (which is fairly obvious really), though we can get it to depend on either the number of pixels in the standard sample width or the sample rate (which of course depend on each other). To summarise, we can express the PAR in terms of “P”, the number of pixels is the standard sample width:
Or we can express it in terms of the sample frequency “f”, where f is the sample frequency in megahertz. :
NTSC: PAR = 648 / (52.6555… * f)
PAL: PAR = 768 / (52 * f)
Expressed as fractions and in the more generic form that doesn’t assume the number of lines:
This last formula is really useful. For example, imagine we have an NTSC device that has a sample rate of 9 MHz. What is the PAR?
PAR = 486 * (4 / 3) / (52.6555… * 9) = 6480 / 4739
This is, in fact, the PAR of an NTSC SVCD.
All this time, the fact that we are assuming that a non-standard sample width doesn’t affect the PAR has been emphasised. Is this indeed what happens? Unfortunately, in some cases, the PAR is affected by the sample width and this assumption is therefore false. This doesn't mean that the formulae above are invalid, simply that we must remember what assumptions were made when creating them. Later on, we will take into account the effect of a PAR that is altered as a result of the sample width when we look at capture cards. In fact, the equations for the PARs are slight modifications of the ones above, so all is not lost.
So, how many pixels do we get in the real world and what is the actual sample rate? Let’s look at these questions next.
We can actually sample the analogue signal as often as we like. For example, we could sample a PAL signal once every 26 us. However, for an active line of 52 us, we only get two samples per active line, or two horizontal pixels per active line. This is unlikely to yield a very good picture! (We’d get a PAR of 384:1 – very short and fat pixels as well.)
If we increase the sample rate to say, once every millionth of a microsecond (very fast indeed, in other words), we’ll end up with 52 million samples of the active line. As well as consuming vast processor and bandwidth resources, we also have a PAR problem, because our pixels will have a ratio of 14.78:1000000! These would be unfeasible thin pixels to implement in any system.
Therefore, the sample rate has to be one that gives us sufficient resolution, but doesn’t go “over the top”. Clearly, there’s no perfect answer, but most importantly, we want to pick a sample rate that everyone else is using. Therefore, we need a standard. Cue “ITU-R BT.601”! (http://www.itu.int/rec/recommendation.asp?type=items&lang=e&parent=R-REC-BT.601-5-199510-I) This states that (paraphrased):
This makes the sample rate easy to calculate, because with PAL, the total line length is 64 us, and for NTSC, it is 63.555… ns. So, we can calculate the sample rate as follows:
Magic! Both come out with the same sample rate. Of course, when the standards were written, this was by design, hence the numbers 864 and 858. So, the sample rate is 13.5 MHz. This 13.5 MHz turns up just about everywhere when it comes to digitising analogue video signals. There are other sample rates relating to other standards, but the 13.5 MHz figure is by far the most widely used, and the best thing that it is independent of NTSC or PAL.
Lets’ quickly summarise where we are. We sample the video signal at 13.5 MHz, and we have formulae for calculating the PAR to give us a 4 : 3 DAR. Let’s put this to the test by digitising a PAL video signal whilst staying within “ITU-R BT.470” and “ITU-R BT.601”.
To get the PAR, we need to know the length of the active picture part of the line. This is 52 us, as already mentioned. So, if we sample that part of the line at 13.5 MHz, we get 13.5 x 10^6 * 52 x 10^-6 = 702 samples. The PAR is therefore (768/702):1 = 128 / 117 : 1 (roughly 1.094017… : 1). Therefore, in this case, the pixels are slightly wider than they are tall. The formula for the PAR could be re-written as:
PAR = (576 * 4 / 3) / (13.5 * 52) = 128 / 117 ~ 1.094017…
(It might have been easier simply to use the equation that relates the PAR to the frequency. However, taking the longer route here keeps our brains sharp!)
If we do the same for NTSC, we get 13.5 x 10^6 * 52.6555… x 10^-6 = 710.84999… samples. Now we have a problem: is this 710, 711 or indeed 710.84999… samples? The answer depends on the hardware doing the sampling. For now, let’s get the PAR of all three!
Firstly, 710 samples:
PAR = 648 / 710 ~ 0.912676…
Next 711 samples:
PAR = 648 / 711 ~ 0.911392…
Finally, 710.84999… samples:
PAR = 648 / (13.5 * 52.6555…)
= 648 / (27 / 2 * (286 / 4.5 – 10.9))
= 648 / (27 / 2 * (28600 / 450 – 4905 / 450))
= 648 / (27 / 2 * 23695 / 450)
= 648 / (27 / 2 * 4739 / 90)
= 648 / (127953 / 180)
= 4320/4739 ~ 0.911585…
To find out which one is “right” we need to get back to the real world!
When it comes to sampling an analogue video signal, it has to be borne in mind that the video signal doesn’t start and stop abruptly. Also, due to the tolerance of the electronics, allowance has to be made for the small variations in the equipment differences. For example, one might be forgiven for manufacturing a PAL analogue to digital sampler that samples the active picture part of the line on the basis that it lasts 51.99999998 us!
Therefore, the standards do allow for a range of active picture times. The ranges are as follows.
If we convert these two extremes to actual times, we get active picture times and thence samples are PARs of:
16.5% --> 53.06888… ns, 716.42999…, 0.904485…
18% --> 52.11555… ns, 703.55999…, 0.921030…
If we convert these two extremes to actual times, we get active picture times and thence samples are PARs of:
11.7 us --> 52.3 us, 706.05, 1.087742…
12.3 us --> 51.7 us, 697.95, 1.100365…
So, if we summarise these values for the PAR, we get (with the mathematically “exact” values in parentheses):
Isn’t this a contradiction? I said earlier that the PAR is independent of the scan width, and now I’m showing that it changes with scan width. The answer is that in the previous section where I said that the PAR does not change with scan width, I was referring to deliberate over (or under) sampling on the part of the D to A converter. However, here, I am talking about the electronic tolerance that is allowed when attempting to sample to the exact scan widths of 52 us for PAL and 52.6555… us for NTSC. I do discuss deliberate over (and under) scanning later when we learn that in fact, this is what most video capture cards do!
When we are converting from one digital format to another, we need to know which PAR the source and destination are using. However, we clearly have a problem here because the standards that we have discussed so far allow a range of possible PARs. What should we do? Perhaps another standard is called for. J
SMPTE RP-187-1995 (http://www.smpte.org/shopping_cart/cart.cfm?function=add&productid=430) tries to resolve this by defining the PARs as follows:
These values are certainly within the allowed tolerances, but the standard, unfortunately, has not been adopted. It has been suggested that this is because they are too hard to adopt in practice. (http://www.lurkertech.com/lg/pixelaspect.html)
Anyone else want to have a go at a standard? J
When it comes to video sampling, there’s square, and then there’s square! To a mathematician, square means a quadrilateral with four right angles and four sides of equal length. However, to a member of the ITU standards committee, square can also mean a rectangle of dimensions 767 x 768! Why?
“ITU-R BT.470” states that a “square pixel” will result if you sample the active picture of a PAL signal at exactly 14.75 MHz and scaled to a 4:3 display, and the active picture of a NTSC signal at exactly 12.2727… MHz and scaled to a 4:3 display. Are they mathematically right? Let’s find out.
Using our handy formulae, we can calculate the PARs as follows:
As can be seen, neither, so called “square pixels” are actually square. However, both a probably near enough to “square” such that the difference is not visible to the naked eye. (I speak for myself only here. If your name happens to be Colonel Steve Austin, you may think differently. J ) What effect does this have on our PARs?
Let’s go back to our mathematically “perfect” PARs:
Now, we can adjust these PARs based on the new definition of “square”. This is not a particularly intuitive process (though interestingly, there are similarities between this and Lorenz’s transformation theory and special relativity, whereby we adjust our frames of reference to give us new dimensions in the “real world”). To do this, we simply divide the “perfect” PARs by the ITU-R BT.470 “Square Pixels” PAR (you’ll have to take my word for this as it’s not obvious):
Fortunately, both ratios are still within the allowable tolerance of PARs. Therefore, even though they are not strictly correct because they are based on “square” pixels that aren’t actually square, they do give us PARs that are within the tolerances allowed in the various standards and they are also simple ratios which should be easy to implement.
It is written in various places that the PAR of NTSC is “exactly” 10:11 (X:Y), and that the PAR of PAL is “exactly” 54:59 (X:Y). IMHO, it is not worth getting too bogged down with the fact that this is only true if we define square as a 768:767 box. I believe that it is far more important that everyone is working to the same standards, so that at least we can all converse with each other happily. J
One final point about PARs. Remember that apart from on displays, they don’t actually exist! OK, they exist as an abstract entity in the minds of people like us, but they don’t exists to a DVD player or a video capture card, for example. For example, when digitizing the luminance values for a PAL video signal, we might end up with a value table something like (the “value” column is completely made up in this example):
… and so on for each line.
Notice that there is nothing in the encoding that explicitly states what the PAR is. This is indeed part of the problem is transferring data between digital sources: each source has its own “understanding” of what PAR it is working with. So, the table of information could be passed from one digital device to another and because they work to their own in-build specifications, of which PAR is not included, the total image could easily become stretched or squashed in one or more dimensions. For this reason, it is up to us to know what the effective PAR is on the various devices that we are dealing with and mould the picture in such a way that the various devices all end up showing the picture with the same total aspect ratio. We need to look at how we do this in more detail.
Let’s take a simple example. We have a PAL DV camera that we wish to edit on our computer and then burn the edited material onto a DVD for display on a plasma screen. In this scenario, there are 4 digital devices and a whole raft of analogue to digital and digital to analogue conversions. Let’s list them, using A and D for analogue and digital respectively. I’m assuming here that the computer is using a flat screen connected by a DVI cable, and that the plasma screen is connected by a RGB cable to the DVD player.
I make that three A to D conversions and two D to A conversion – one of each of which occurs within the DVD player and which nothing can be done about*. In an ideal world, this would only be one A to D conversion at the DV tape stage.
Imagine that throughout this whole process, the source (DV tape) and destination (plasma display) PARs are the same and also that the number of horizontal and vertical pixels between the source and destination are the same. It doesn’t take much to work out that this is the ideal situation because the image will appear on the destination display device with exactly the right aspect ratio (assuming, of course, that it is correct in the source). Not surprisingly, in the real world, this rarely happens, and we do need to consider the PAR and the number of horizontal and vertical pixels at the source and destination, and calculate from these what we need to do in the edit stage to get everything looking “perfect” in the destination display. The PAR has been covered ad nauseum. What about the horizontal and vertical pixels?
[* Actually, it is possible to take a direct digital feed from the MPEG decoder using a SDI interface and feed that directly to a scaler.]
In all the discussion about PARs, I didn’t once refer to the actual number pixels on a horizontal line that are actually output. We know that we only need to sample the active picture and that the sample rate defines how many pixels this gives us. A reminder of the simple case of digitizing an analogue PAL and NTSC picture of the standard heights, sampling only the active picture and staying within the various standards that we’ve mentioned.:
Is this what is actually output? No! The ITU-R BT.601 standard actually states that we must output 720 pixels when digitising both NTSC and PAL. The reason for this will become clearer later on. As has been shown, when taking the various tolerances into account, there is a range of pixels that could represent the active picture. These are approximately:
Now the figure of 720 is looking rather handy for a number of reasons.
This begs the question of what makes up the remaining pixels? The answer is that the standard doesn’t actually specify, so it is left to the specific implementation of the A to D converter and software (driver) to decide. Sometimes, the remainder is just black pixels. However, it would be equally plausible to scale the whole thing up, whilst leaving the PAR the same, so that the whole image fits into the 720 pixels. Another possibility is to scale horizontally, but leave the vertical as is. This, of course, messes up the PAR, but unfortunately this is also quite common – particularly with capture cards! If you know that this happens, and you know the amount of horizontal scaling, it is possible to account for this when editing the video. If you don’t know, what should you do? This is harder to answer because even if you take a reference picture and transfer it to the computer and count the pixels (looking, for example, for black at either end), you are at the mercy of the software that did the transfer in the first place. Can you guarantee that it transferred the picture faithfully, or that the software that you are using is displaying the picture faithfully? There is a web site that assists in this department by playing a standard DVD and effective counting the pixels. (http://www.doom9.org/index.html?/capture/capture_window.html) The best thing to do, therefore, is look up on the Internet what your particular A to D converter actually does and work from there – try to avoid guessing!
In all the discussion about PARs, I didn’t once refer to the actual number of vertical pixels that are actually output. In our minds, we might have assumed that for PAL, 576 lines are output and for NTSC, 486 lines are output. Is this right? Possibly.
Nearly all PAL instances of A to D conversion output 576 pixels if there are 576 active lines, but most (not all) instances of NTSC A to D conversion output 480, not 486 pixels active lines. Why? Because, as I mentioned before, MPEG encoders work on 16 x 16 blocks, so it is very useful indeed if the height were divisible by 16. 576 is OK, but 486 is not. The next smallest mod 16 value is 480. Therefore, most A to D conversion crops to this size. Note that I said “crop” and not “scale”. Scaling 486 to 480 would be a nightmare and probably yield a fuzzy picture. It is far easier simply to lop off the top and bottom 3 row and output the rest. This is indeed what is almost always done. (Some high-end A to D converters give all 486 rows.)
Capture cards make good examples of A to D converters because:
All capture card capture at (or attempt to capture at) 13.5 MHz and capture (or attempt to capture) the standard active picture widths, right? Sadly, no, as has already been alluded to. If this were the case, life would be too simple! Most capture cards actually over-capture or under-capture. That is, they capture more or less than the standard active picture widths. So, providing that they capture at exactly 13.5 MHz and over or under-sampling doesn’t affect the PAR, we can take the PARs to be:
An example is called for.
Suppose that we have a PAL video capture card that only captures 51.56 us of active picture instead of 52 us. This indeed actually happens with BT878 and the iuLabs universal WDM v184.108.40.206 driver. (http://www.doom9.org/index.html?/capture/introduction.html) How many horizontal pixels will be captured?
51.56 x 13.5 = 696 horizontal pixels, approximately.
Because the ITU-R BT.601 says that we must output 720 horizontal pixels, we would hope that the card will pad the image with 12 pixels either side, leaving the PAR alone. An ITU-R BT.601 compliant device reading this information will not know (or indeed care) that there has been any padding because it will simply read the pixels and “assume that” the PAR is 128 : 117. This would, of course, gives us the correct aspect ratio and the resulting picture will have small black bars either side. Again, this would make life far too simple, so what actually happens? Most capture cards ask the user prior to the capture what pixel output they require. On the surface, this sounds like a nice friendly question but unfortunately it is highly dubious! The reason is that if asked I would prefer it to output exactly what had been scanned. So, my preferences would be (on order of preference)
In fact, either will do as none of them is particularly “harmful” to the PAR. It’ll never offer the first choice, however, because most cards and driver combinations would hate to admit that they are in this case under-scanning. Well, that’s not so bad is it? Sadly, yes. To see why, we need to look at horizontal resampling.
The driver of the A to D converter tries to be clever here. If I ask for 720 x 576 pixels, it says “Ah – you want 720 pixels do you? In that case, I’ll resample the image horizontally, as in stretch it, to give you 720 pixels. No black bars for you Sir!” You can scream and shout that you actually want your black bars, but “driver knows best”!
We haven’t looked at horizontal resampling yet, but it doesn’t take much to work out that this does affect the PAR and invalidates our earlier assumption about a changing sample width not affecting the PAR, which consequently means that the capture card will have stretched (or shrunk) the image, albeit by a small amount. It is worth emphasizing here that it is not the fact that the image has been over or under scanned that is the problem per se. It is simply that the card resamples the result of the A to D conversion to the format that you requested.
How do we overcome this changing PAR? There are two solutions.
Let’s take another example of capturing PAL using a sample rate of 13.5 MHz and a sample width of 53.333… us. How many pixels are captured? 53.333… * 13.5 = 720 pixels. Let’s also ask the card to output 720 x 576, which it should offer us. In this case, the card will not do any resampling because it has captured 720 pixels. Hurrah! One thing to note is that the card has, however, over-captured by 18 pixels. These 18 pixels are captured horizontal blanking data. However, the card will not worry about this and show it as black. Therefore, we will get black bars either side, but technically, these black bars are simply a result of over-capturing and not a post-capture addition of black bars. Either way, the PAR is not changed.
Now let’s take another example of capturing NTSC using a sample rate of 12.306394 MHz (this rate is explained in the next section) and a sample width of 52.80 us. How many pixels are captured? 52.80 * 12.306394 = 649.78 (approx.) pixels. Let’s also ask the card to output 640 x 480, which it should offer us. In this case, the card crops 6 pixels vertically to get the resolution down to 480 pixels. We’d hope that it would also crop horizontally, but it doesn’t – it resamples the picture so that exactly 640 pixels are output. If we call the PAR before the resampling PARold and the PAR after the sampling PARnew, then we can say that:
PARnew = 649.78 / 640 * PARold
Notice that this is independent of whether it is NTSC or PAL and it makes no statement about the number of active lines. However, one of our earlier equations related the PAR and the sample frequency, f. These assumed that the PAR had been unchanged by any resampling. Of course, for PARold this is true because the resampling has not yet taken place. So we can substitute in our equations for the PAR for NTSC and PAL:
PARnew = w * f / Wnew * H * DAR / (52 * f) = (H / Wnew) * (w / 52) * DAR
Similarly for NTSC:
PARnew = w * f / Wnew * 58320 / (4739 * f) = 90 * (H / Wnew) * (w / 4739) * DAR
Now at last we have the PAR in terms of the sample width, w, and we are no longer assuming that the card does not adjust the PAR prior to outputting the pixels. We do, however, need to be careful about the meaning of “Wnew” if we are to use this equation in both situations of a card that scales the picture to the requested output size and one that add black bars (or crops). Wnew is the width in pixels after any resampling (scaling) of the picture but excluding any cropping or adding of black bars. So, a PAL device that had a sample width of 52 us, but which added black bars to get the horizontal size up to 720 pixels would work as follows:
H = 576
Wnew = 702 (from 13.5 * 52), NOT 720
w = 52
Therefore PARnew = 576 / 702 * 52 / 52 * 4 /3 = 128 / 117. Therefore, the PAR is exactly what it was before the black bars were added. However, if instead the card resampled the picture to 720 pixels wide, we get:
PARnew = 576 / 720 * 52 / 52 * 4 /3 = 128 / 120, so indeed it has changed from 128 : 117.
DIAGRAM SHOWING RESAMPLING AND THE FACT THAT THE NEW PAR IS NOT INTUITIVE
Before getting into the subject of converting from one format to another whilst maintaining an exact PAR, we need to look at devices that don’t sample at 13.5 MHz.
So far, we’ve taken 13.5 MHz to be a ubiquitous number. However, this is not always the case. A good example is a PAL TV tuner card running on a PC. (Sorry about all the PAL examples – it’s just that PAL is easier because there are fewer nasty fractions to deal with.)
Let’s design such a PAL TV tuner card from scratch and see where it takes us.
Firstly, what’s the target’s PAR? Well, being a computer monitor, it is exactly 1:1, i.e. square. We’re not talking about 767:768 square, we’re talking about “real” square. We’ll also assume that the DAR is 4 : 3, which it almost always is. (1024 x 768, 800 x 600 and so on.) Finally, we’ll try to adhere to the standards described so far.
Firstly, we’ll read the active 52 us of the PAL signal and sample it at 13.5 MHz. This gives us:
52 * 13.5 = 702 pixels wide x 576 pixels high.
If we send this to a screen with a PAR of 1:1, we’ll get an image that is too tall. This is because PAL pixels, as we know, are slightly wider than they are tall. If we squeeze these pixels in on either side to get a 1:1 pixel, the image will stretch in height. So, what are the possibilities? One thing we could do is resample the image horizontally in software before sending it to the screen to force it into a 4:3 ratio. How much scaling?
Required pixel width = 576 / 3 * 4 = 768.
(This is an easy calculation because the pixels are square.) So, we need 768 pixels. Therefore, we need to resample the image horizontally by a factor of 768 / 702. It’s no coincidence that this is resampling by 128 / 117. Doing this will give us the required pixel and display aspect ratio. Any further resampling (for example to get full screen) is proportional and can be done in software.
The downside of this is that this resampling has to be done “on the fly” by the TV card, which would be a processor intensive operation. Another solution would be simply to change the sample rate so that it gives us 768 pixels “right out of the box”. That way, no horizontal resampling is required. What sample rate does this? All we have to do is increate the sample rate using the same proportions as earlier: 128 / 117.
New sample rate = 13.5 * 128 / 117 = 1728 / 117 = 192 / 13 ~ 14.769230… MHz
Problem solved! This is actually the sample rate that PAL TV tuner cards work to.
Doing the same for an NTSC tuner card, we get:
New sample rate = 13.5 * 4320 / 4739 = 58320 / 4739 ~ 12.306394… MHz
Again, this is actually the sample rate that NTSC TV tuner cards work to.
So, if we sample the picture at a rate that is different from 13.5 MHz, we will get the right number of pixels to display the picture at the correct 4 : 3 DAR.
So far, we’ve dealt with the follow types of device for both NTSC and PAL:
Notice that all three contain A to D converters. However, when it comes to transferring between digital devices, not all do an A to D conversion. (Actually, many do, but the conversion happens internally with a D to A conversion again. An example of this is a DVD player. However, we will ignore the fact that this happens and simply treat the whole thing as a “black box” into which a signal goes and out comes a signal.) Some common examples of other types of “kit” are:
Notice that I deliberately listed DVDs and DVD players separately. This is because the whole process of manufacturing DVD from the camera is effectively an A to D process. I.e., the camera (analogue) to an MPEG-2 stream on the DVD (digital). In practice, there is (or can be) more than one A to D conversion, but we do not need to worry about the details here. However, a DVD player takes an MPEG-2 stream (digital) and outputs some kind of analogue video signal, such as RGB (analogue). So, for a DVD player, we have a D to A conversion. (Again, there is actually more than one conversion, such as occurred between the MPEG-2 decoder chip and the scaler / deinterlacer chip. However, we can ignore these details.)
By comparison, often what is found on the Internet is a statement like “the sample rate of a DVD is 13.5 MHz”. Clearly, a DVD doesn’t sample anything – it’s just a shiny storage medium for data! The player also doesn’t sample anything as it’s only fed with a digital signal. (Again, not strictly true if we want to get to the innards of a DVD player, which we don’t!) What we can say is that the process of manufacturing a DVD involves sampling an analogue signal at 13.5 MHz and that a DVD play will faithfully reproduce the analogue signal on the basis that the original encoding was done at 13.5 MHz. The distinction is somewhat pedantic, but it clarifies what is really meant by “the sample rate of a DVD is 13.5 MHz”. Most sites therefore don’t distinguish between DVDs and “DVD player” and just say “DVD”. This site is the same and assumes that DVD, DV players, VCD players and so on all faithfully reproduce the original analogue signal because they make an assumption about the original sample rate that was used.
This is where it gets fun, because we have varying sample rates, PARs, horizontal lines and vertical lines! However, we also have the tools to work through methodically, always remembering that we want to preserve the aspect ratio of the source.
This example works through the process very laboriously. However, it puts to use everything leant so far and makes subsequent calculations easier.
Suppose that we have an NTSC TV tuner card that has an active sample width of 52.6555… us and that we want to make an NTSC SVCD out of it. The first thing to do is get more information about the source and destination.
We look up on the Internet and find that the sample frequency of an SVCD is 9 MHz. We can immediately calculate the PAR of an SVCD using the equation that relates PAR to the sample frequency:
PAR = (H * DAR) * 90 / (4739 * f)
PAR of SVCD = 648 / (52.6555… * 9) = 6480 / 4739.
Next, we look up on the Internet how wide and high an SVCD is in pixels. The answer is 480 x 480. Let’s stop and think about what this means for a second. The way I look at it is: if I started with an NTSC analogue signal and had to encode a SVCD out of it, what would I have to do? The first thing would be to find out what the sample rate is for SVCD. We have already found this out when calculating the PAR: it is 9 MHz. How much of the 52.6555… us is sampled when creating an SVCD? We need to look this up and discover that it is 53.333… us (that is, it over samples the active picture). How many pixels does this create horizontally? 9 x 53.333… = 480 pixels. What about vertically? Well, NTSC is 486 lines and it comes as no surprise that the creation of SVCD invariably crops 6 of the lines to give 480 vertically. So in summary for SVCD we have a sample rate of 9 MHz, giving a PAR of 6480 / 4739, a sampling width of 53.333… us, a total sampling window of 480 x 486, an output window of 480 x 480 and no scaling, so the PAR stays at 6480 / 4739. Phew! We shall assume that the SVCD player is faithful to these figures and outputs the analogue video stream appropriately.
Now the source TV tuner card. We know the sample frequency is 12.306394 MHz. Straight away we can calculate the PAR:
PAR = 648 / 52.6555… x 12.306394 = 1 (i.e. square pixels). (H * DAR was shown as 648.)
After looking up how much of the 52.6555… us active picture is sampled, we find out that exactly all of it and no more is, i.e., 52.6555… us. This gives us a horizontal pixel width of 52.6555… * 12.306394… = 648 pixels. However, NTSC cards usually resample the output to 640 pixels. What about vertically? Well, NTSC is 486 lines as ever, and the TV card, like nearly all cards, crops 6 pixels, so it outputs 480 pixels vertically. This means that the output is 640 x 480 but the picture has been resampled giving a revised PAR of (1 * 648) / (1 * 640) = 81 / 80.
Looking at the vertical size of the picture, we’re in luck because both the source and destination are the same. What do we have to do in the horizontal dimension? There are so many different ways of approaching this! Here’s one of them:
“Clearly”, we need to resample the picture before creating the SVCD. Let’s start by resampling it to get the PAR back to 1:1. What is the width in pixels we’ll be creating? We simply need to resample back to what the width in pixels was before the card resampled it down, i.e. 648 pixels. So, we resample the picture creating an output of 648 x 480. This gets the PAR back to 1:1. What if now we resample the picture again down to 480 pixels wide? This would give us the correct number of pixels for an SVCD. This would then be creating a PAR of 648 / 480 = 1.35 (remember that we started off with a PAR of 1). However, the required PAR for SVCD is 6480 / 4739 = 1.37 approximately (slightly wider than we got). So, simply resampling down to 480 will not do the trick. OK, let’s turn the question on its head and ask what we have to resample the 648 pixels down to, to get a PAR of 6480 / 4739? If we denote the number of pixels that we resample down to as X, we can say that:
648 / X = 6480 / 4739, because it is a simple ratio.
Therefore, X = 473.9
So, let’s resample the 648 pixels down to 473.9 pixels (In the real world, it’ll have to be 474 pixels, but let’s stay hypothetical for now.) This gives us a PAR of 6480 / 4739 and an output of 473.9 x 480. Because we are now short of 6.1 pixels, we’ll have to add these ourselves. In summary, this is what we’ve done:
Notice stage 2 and 3 resample up and then down the picture. Can the first resampling be skipped? Look at it like this: If at stage 2, we resampled it up to 12345678 x 480 pixels and then resampled it down to 479.9 pixels, would it make a difference? The answer is no, because every time you resample, the same total picture exists, it is simply being stretched on way and then the other. Indeed, resampling twice in a row is bad because each scaling operating loses some information. Therefore, we can change the procedure to:
In the real world, with integer pixels:
Before leaving this example, the fact that we were able to remove the second stage in the calculation does beg the question of whether we care that the TV card itself resamples the image of not. This is a question that is often skipped or glanced over on many web sites. In this case, clearly not. However, PARs were a big concern when we talked about how to get the picture displaying correctly on a screen. Why is this? Simply put, if we have a sequence of event that goes something like:
… only the final resampling need be done. Simply squeezing and stretching the picture a few times doesn’t affect the final result. In the case of a computer monitor being the final destination, there is only one resampling stage that we need to worry about and that is the one of the card resampling the image down to 640 pixels wide. In the case of converting to an NTSC SVCD, however, there are subsequent resampling operations that take place, making the preceding one superfluous. Therefore, it is only the final resampling that we really need to worry about.
This example is almost identical to the previous except that in this case, the TV tuner card crops rather than resamples from 648 to 640 pixels wide. Does this make a difference to the calculations? (Admittedly, we’ve lost a small amount of the picture, but let’s not worry about that – all we want is to keep the aspect ratio correct.)
We start with a picture that is 640 x 480 and has a PAR or 1. What if we simply resample the picture again down to 480 pixels wide? This would give us the correct number of pixels for an SVCD. However, we would then be creating a PAR of 640 / 480 = 1.333, but the required PAR is 1.367 approximately (slightly wider than we got). So, simply resampling down to 480 will not do the trick. OK, let’s turn the question on its head and ask what we have to resample the 640 pixels down to, to get a PAR of 6480 / 4739? If we denote the number of pixels that we resample down to as X, we can say that:
640 / X = 6480 / 4739, because it is a simple ratio.
Therefore, X ~ 468.04
So, in this case, we have to resample the picture down to 468.04, let’s say 468, being the nearest integer. This leaves us short of 12 pixels, so we’ll pad 6 pixels either side. In the real world, with integer pixels:
Clearly, it does make a difference that the TV tuner card crops and not scales the picture. Can we come up with a generic equation or procedure that takes either situation into account? Of course we can!
In the first (TV card resamples) example, we resampled to:
(648 * 4739) / 6480 pixels, and in the second (TV card crops) example, we resampled to:
(640 * 4739) / 6480 pixels.
The only difference is the 648 and the 640. However, in the first example, we notice that:
640 = 648 * (80 / 81) = 648 / PAR of scaled picture
Re-writing this equation, we get:
648 = 640 * PAR of scaled picture
We can write this into the first example’s scaling, saying that we scale the picture to:
(640 * PAR of scaled picture) * 4739 / 6480 pixels.
In the second example, the PAR of the scaled picture was 1 because it wasn’t scaled, so we can also re-write the second example as:
(640 * PAR of scaled picture) * 4739 / 6480 pixels.
Bingo! We have two equations that are the same, so we now have a way of calculating the scaling that we have to do which is “independent of” (i.e. we don’t need to think hard about it – not mathematically independent) of whether our capture card scales or crops. We need to rearrange the equation slightly, because 4739 / 6480 = 1 / (6480 / 4739) = 1 / PAR of NTSC SVCD. Therefore, the completely generic equation that defined the number of pixels “X” horizontally that we need to scale to is therefore:
X = (Wsrc * PAR of scaled picture) * (1 / PAR of destination), which is the same as:
Considering the hoops that we’ve had to jump through to get here, this is a remarkably simple equation! However, we have explained where is comes from and why it is an “exact” equation and not just an approximation.
Wsrc will be obvious because it is usually printed on the outside of the box, or is in the software configuration! PARdst we can look up in a table (or work it out easily enough from the sample rate). PARsrc is the most trick one, but only just, because it requires that we know whether the card scales or crops the image and hence what the PAR is.
Don’t forget that some cropping or adding of black pixels may still be required after the resampling.
We should be able to use our new generic equation to good use now. Firstly, let’s find out about the source. We look up and find that it samples a standard PAL picture of 576 lines at 13.5 MHz, giving a PAR of 128 / 117. This is the PAR of the source before the DV camera has done any last minute scaling. Does it indeed do this? We look up the sample width and find it to be 53.333… us. So, the width horizontally is 13.5 * 53.333… = 720. The standards say that we must output 720 pixels horizontally and 576 vertically, which is rather handy because it means that we don’t have to crop or resample horizontally and vertically, we’ll just leave it as is. Therefore, the PAR is unchanged and we can say that PARsrc = 128 / 117. That’s the hard bit! Now the easy bits of looking up the PAR of the destination – the computer monitor, which of course is 1. So, we need to resample the image to the following number of pixels:
720 * ((128 / 117) / (1 / 1)) ~ 787.692… (Round it up to 788)
So, we resample the picture to 788 x 576. On a computer monitor, we probably don’t need to do any last minute cropping or adding of black bars, but let’s suppose that the software we use insist on some predefined format such as 768 x 576, we would have to take off 20 pixels, 10 off of each side would make sense.
Let’s check that this figure of 787.692… is right and suppose that we do indeed resample the picture to this width. What proportion of the screen contains active picture (remembering that the DV originally over sampled each active picture from 52 us to 53.333… us)? Clearly it is 52 / 53.333… (or 702 / 720 – whichever way you want to look at it). So, the proportion of picture that is active in our new window is:
787.692… * 52 / 53.333… = 768
So, the DAR is 768 : 576 = 4 : 3. So, we have successfully got back to a 4 : 3 DAR.
Note that in the real world, there’s usually no need to do this resampling because the software that we’re using to display the picture usually “knows” that we need a 1 : 1 PAR and so it resamples “on the fly”. However, it does depend on how the data were transferred from the DV camera. As an example, transferring uncompressed DV (about 6 times the size it is stored on the DV tape) results in Windows Media Player 10 not correcting for a 1 : 1 PAR monitor. However, transferring the data as a Type 2 AVI (same size as on the tape), results in Windows Media Player 10 correcting the picture to display correctly. (I did measure the “uncorrected” PAR and got PAR: 0.912, which is very close to the expected PAR of 0.914062.) The moral of the story is, don’t use WMP 10 to decide what the internal PAR of the file is!
We already know that the PAR of a PAL DV is not resampled and has a PAR of 128 / 117. What about the destination? The width of the picture in pixels is 720 and the PAR is 128 / 117. Again, the height of both the source and destination are 576, so we do not need to worry about that. We resample the picture to the following number of pixels:
720 * ((128 / 117) / (128 / 117)) = 720
So, we don’t actually need to resample the DV at all. What about cropping or adding black bars? As a result of not resampling, we’ll end up with a picture 720 x 576, which is exactly what we need for a PAL DVD, so cropping or adding black bars either. This has to be one of the easiest transfers to do!
Note that whilst you are editing the video on a computer, the aspect ratio will be wrong, of course, unless the software compensates on the screen (as long as it doesn’t adjust the video data itself). This does not matter, because the final output will be right for a PAL DVD.
This is getting into tiger country now, because this will involve scaling vertically to get the correct PAR. In general, vertical scaling has a noticeably adverse effect on the image. However, let’s not worry about that and carry on regardless. One word of caution: if resampling vertically, ensure that the source material is deinterlaced first. DVDs are generally (but not always) interlaced, and any attempt to scale an interlaced picture will be horrible!
The source material has a size of 720 x 576 and a PAR of 128 : 117, and the destination has a size of 720 x 480 and a PAR of 4320 : 4739. Let’s first try to resample vertically and see where this takes us. So, firstly we resample vertically to get 480 lines, giving us 720 x 480 pixels. What does this do to the PAR? The new PAR can be calculated as follows:
(If you think that this should be 128 / 117 * 576 / 480, remember that resampling takes all the existing information and “overlays” the new sample on top of it, giving a new PAR.)
What do we have to do horizontally to get the PAR to 4320 : 4739? Put another way, how many pixels do we have to resample to horizontally to get the PAR to 4320 : 4739? Let’s call this number X and do the sums. If we resample to X pixels wide, the new PAR is:
So, we don’t need to resample horizontally to get extremely close to the correct PAR! Let’s see if we can get a more generic equation out of this. Rearranging X in the above equation and re-substituting back in the 128 / 117 * 480 / 576 for the 320 / 351, we get:
But, the 720 is actually Wsrc, 128 / 117 is PARsrc, 4320 / 4739 is PARdst, and 480 / 576 was the factor by which we scaled the image vertically. So, we can write:
This is a more generic version of the equation in example 2. In example 2, Fv is 1 because we are not scaling vertically.
When resampling in software, such as with VirtualDub, you are asked what the target sizes are that you are after as opposed to any PARs. So, taking this example again, we would tell VirtualDub that the target height was 480, because that is exactly what we need for NTSC. From this, we calculate Fv, which is 480 / 576. We look up or calculate the PARs of the source and destinations in a table and from that calculate Fh, which is (128 / 117) / (4320 / 4739). We know the width of the source is 720, so multiplying the three together, we get 720.076, which we round to 720. So, we tell VirtualDub to scale to 720 x 480. VirtualDub then re-samples the picture to this size and because the destination already has the correct PAR, no further cropping is required.
Note that this gets the destination aspect ration correct. Of course, this does not make it NTSC because the frame rate is still wrong. Software such as VirtualDub can correct the frame rate as well, though I have not tested how well it does it.
Now we are really going to town! Knowing that VCD are much “smaller” than DVDs, it doesn’t take much to work out that this will involve resampling in both dimensions. Let’s go straight for the equation.
We look up in a table and find that the destination height is 240. The height of the source is 576. So, Fv is 240 / 576. We look up the PARs of the source and destinations in a table and from that calculate Fh, which is (128 / 117) / (4320 / 4739). The width of the source is 720, so X = 360.038 approximately. So, if we resample the picture to 360 x 240, we get the correct aspect ratio. However, in this case, the target size is 352 x 240. Therefore, we need to crop the picture by 8 pixels vertically (4 at the top and 4 at the bottom) to get the correct size.
This should be an easier example again, but is worth doing to get comfortable with the equations. Therefore, we’re not going to look up pre-calculated values in tables this time, we’re going to works them out for ourselves. Some values we have to look up from the standards. These are:
· PAL lines for both DV and SVCD: 576
· PAL DV sample frequency: 13.5
· PAL SVCD sample frequency: 9
· DV does not change the PAR prior through resampling to output
Source – PAL DV:
PAR = 576 * (4 : 3) / (52 * 13.5) = 128 : 117
Any pre-output resampling that affects the PAR? No.
Output height: 576
Output width: 720
Destination – PAL SVCD:
PAR = 576 * (4 : 3) / (52 * 9) = 64 : 39
Output height: 576
Fh = (128 / 117) / (64 / 39) = 2 / 3
Fv = 576 / 576 = 1
Wsrc = 720
X = 720 * 1 * 2 / 3 = 480
So, if we resample the DV output to 480 pixels horizontally, we will get the correct PAR for SVCD and we will output 480 x 576 pixels, which is rather handily the size we want anyway, so no cropping need be done.
In nearly all cases, A to D devices do not sample the standard widths of 52 us for PAL and 52.6555… for NTSC. In fact, they nearly all change the scan widths so that the number of pixels output on the horizontal axis is exactly the number needed by the format’s standards. This means that the A to D device doesn’t need to worry about adding black bars or cropping the picture horizontally. It simply outputs all the pixels sampled, giving the required size horizontally. The main exceptions to this are capture cards, which resample (scale) the picture horizontally so that they output the correct number of horizontal pixels. This, of course, affects the PAR which affects the resampling calculations.
This formula can be used to calculate the pixel aspect ratio from the height of the active screen, the display aspect ratio and the sample frequency. “Active screen” height here means the height before any vertical cropping has occurred. For NTSC it is usually 486 and for PAL it is usually 576. In all the examples used, the DAR is 4 : 3.
These formulae can be used to calculate the pixel aspect ratio after it has undergone pre-output resampling (for example by a video capture card) from the height of the active screen, the new width of the picture in pixels, the sample width and the display aspect ratio. “Active screen” height here means the height before any vertical cropping has occurred. For NTSC it is usually 486 and for PAL it is usually 576. In all the examples used, the DAR is 4 : 3.
This formula can be used to calculate the width of picture in pixels that we need to resample to to get to the correct PAR from the PARs of the source and destination and output heights of the source and destination screens.
1. These values are no particularly important if the device doesn't resample the picture prior to output, as would happen if it add black bars (or crops) or it already outputs exactly the right number of pixels horizontally anyway. The obvious exception is for the TV cards listed.