audio / video / lighting / whole home / furniture / cables / accessories
progressive av guide to video processing technology |
|||||||||||||||||||||
What follows is as brief a description as I could manage as to what video processing entails within an entertainment system, and how these process are done by your display, source component or video processor. It is by no means a be-all and end-all of what is involved but it touches on the basics of deinterlacing and scaling, and some of the other features that you might find in a video processing device and why these are important when it comes to recreating your image just as the director had intended it. I hope this overview proves of some use to someone when it comes to explaining to your friends exactly why you spent hundreds even thousands of pounds on that ugly black box in the rack! If not, I hope it was an interesting read nonetheless.... Liam McLaughlin - resident video nerd.
|
|||||||||||||||||||||
signals your video processor will come accross |
|||||||||||||||||||||
There are many signal types and resolutions used in video transmission today. What is more relevant for the video processor in your system, is the type of resolution and material these signals are carrying, and what should be done to achieve the best performance from your display. These could be regular, interlaced signals, PAL TV, Progressive scan DVD, or even a PC image. They could be high resolution/definition, meaning they are made up of many lines of information. Or of a lower definition - meaning they contain comparatively less lines of detail for a single image. They maybe interlaced in format, or progressive scan (denoted by an “i” or “p” in the name). A more complex entertainment system may contain the following sources:
These signals need to be converted to the best resolution and refresh rate that suits the display's “native resolution” i.e. the exact number of pixels the display is made up of. Material will also want to be converted into the most appropriate frame format such as 24p or 72Hz depending on what the source is made up of, and what the display can accept. Examples include: High Definition 42” Plasma 1024 x 768 FULL High Definition 42” Plasma 1920 x 1080 High Definition LCD TV 1366 x 768 FULL High Definition 32” LCD TV 1920 x 1080 Standard Definition LCD/DLP Projector 848 x 480 / 1024 x 576 High Definition / FULL High Defintion LCD/DLP/LCOS Projector 1280 x 720 / 1920 x 1080 CRT Front Projection Variable from 480p upwards n.b. CRT systems use electron guns scanning in horizontal lines to display an image, they are not made up of a fixed number of pixels such as plasma/LCD screens and LCD/DLP projectors are. In this case the processors task is to feed the highest resolution and refresh rate which the CRT device can display without issue |
|||||||||||||||||||||
what exactly is an interlaced or progressive signal? |
|||||||||||||||||||||
We've talked about a signal being interlaced, or being deinterlaced to become a progressive signal. But what does that mean??? Your traditional CRT television (the ugly box in the corner of your lounge you used to have before you got that plasma screen!) displays its image in an interlaced fashion. If you imagine a PAL derived picture as a series of 576 lines, the CRT gun in the TV will quickly draw all the 288 odd numbered lines, and then a split second later all the 288 even numbered ones. This is done so quickly that the brain interprets the two fields of information together to assemble the full resolution 576 line image. These two “fields” of odd-numbered and even-numbered lines, when combined, are known as the single video “frame”. A 50Hz signal, as we have in the UK, will show 50 frames in a single second, these will be 25 odd-numbered line fields and 25 even-numbered line fields and will be displayed as such on an interlaced display. With today's digital displays, such as a plasma screen for example, there isn't the ability for the screen to show interlaced fields in the same interlaced fashion as a CRT TV does. With a plasma screen, the display is made up of a fixed number of pixels rather than moving scan lines, all of which must be lit at once to show one whole video frame at a time. So, the same 50Hz signal can only be displayed as 50 “progressive” frames i.e. the odd-numbered and even-numbered lines are combined, or deinterlaced, to produce only complete frames for progressive display. But this joining process isn't always so easy. When the source is a film it is easier, the material has usually been recorded on a movie camera at 25 frames per second (24 for US), and each frame is progressive already. To record onto a DVD disc or broadcast over the airwaves the 25 progressive frames are split into 50 odd or even lined fields i.e they are interlaced. Your average CRT TV has no problem displaying this in interlaced fashion as descirbed above. Each frame is shown in two fields each, and this takes up the 50Hz frequency (25 frames x 2 halves). The digital display however has to merge the two sets of fields together to recreate the original 25 frames that it was original recorded in. Since the signal started life in this format, a simple weave of the two is all that is necessary. In order to keep up the 50Hz frame count, each full field is displayed twice which is exactly the same as how film material is shown in the cinema. However, what if the original recording was made in interlaced format from the outset? This is what happens with most studio cameras and is know as video material, rather than film material as above. At no point was there ever a progressive original. The first recorded frame exists of only the odd-numbered lines from the camera snapshot, and the second recorded frame (recorded 1/50 th of a second later) only shows the even-numbered lines. This is contrary to above where an originally progressive signal has been broken down into interlaced fields. The CRT TV again has no problem, but for the plasma screen this is not as easy as before. The two halves cannot be directly matched together again as information in the second field was recorded 1/50 th of a second after the information in the first field. If it were just weave deinterlaced as with film, there would be inconsistencies wherever detail had shifted in time between fields. The video processor inside the screen must carry out some interpolation to work out what has moved, what hasn't, and how best to reproduce this information, basically guessing all the information that hasn't been recorded.
|
|||||||||||||||||||||
the film mode deinterlacing process |
|||||||||||||||||||||
|
Film mode deinterlacing is where the material was originally recorded progressively on film cameras as discussed early in section 2. The important process here is lining up the correct interlaced fields for progressive display. Since the material started progressively and then underwent a process of interlacing, the deinterlacing process should be able to recreate the original progressive frames exactly. The diagram below shows the process for a 25fps signal (what we use in PAL land!), ultimately being displayed as 50Hz progressive:
- the original 25 progressive frames (top row) are shown twice each in the cinema to avoid flicker (second row) Easy so far? In NTSC land (e.g. USA) the process is a little more complicated. The film material is recorded as 24 fps rather than 25fps, which is displayed as 48 fps in the cinema i.e. each frame shown twice as with PAL sources above. But in America the television system is broadcast at a 60Hz refresh rate, which doesn't match the 24fps the material is now in, or the 48fps we would get if we simply doubled the frame count as we do in PAL 50Hz land. Some frame manipulation must be done since the broadcast rate is 60Hz and this is what TVs have traditionally be designed to display at. So instead of simple showing each frame twice (which is known as a 2:2 sequence), each frame is repeated but with every other repeated three times to make up the 60 frame count. i.e. 111 22 333 44 and so on. Because of this uneven repetition there is some image judder noticeable especially with slow panning shots or scrolling text and the like.
As above shows, the process between 24fps and display in the cinema at double the rate is the same, but each frame is only taking up 1/48th of a second when it should be 1/60th of a second (60Hz means 60 times per second). So this 3:2 sequence is employed to fill the time line with enough frames to make a 60Hz signal:
- the top line of this diagram shows the interlaced 48Hz signals from earlier now converted to an interlaced 60Hz one and broadcast over the airwaves or stored on a DVD. Notice that field 2a and 4a appear twice... Cadence detection is this process of correctly identifying the frame sequence and applying the appropriate pulldown method to ensure the original frames are analysed in the correct order, and that repeated frames are ignored. The processor will have “film mode” or “video mode” detection to detect whether the material might be film mode and hence is able to be deinterlaced perfectly, and then cadence detection analyses and interprets the frame sequence of the material to ensure correct pulldown is applied as above. Better processors will include the ability to detect not just 2:2 and 3:2 cadences, but also the more weird and wonderful ones found in broadcast. In some animations the frame sequence might be 5:5, with some broadcasts in an attempt to speed up movies (and maximise advertising breaks!) the broadcaster might remove every 12th frame creating a 3:2:3:2:2 sequence!! This was something originally progressive, and so should be able to be recreated perfectly using film mode deinterlacing - however because the broadcaster altered the cadence of the material only better processors will recognize this new cadence and correctly reproduce the original frame. Without proper detection the processor might misalign the wrong fileds, or the processor reverts to video mode deinterlacing which doesn't take into account the full resolution of the image.... HD-DVD and Blu-Ray (2008 update). Many of you may have come across the 24p scenario from your new Blu-Ray and/or HD-DVD player. This is similar in theory to the NTSC section above in that HD movies are usually recorded on 24p film, but want to be played back on 60Hz displays. With the popularity of HD the display manufacturers have started catching up to this and providing 24p capable inputs. This means that if you have a Blu-Ray or HD-DVD player which is able to output 1080p24, a display which is compatible with this will correctly display it at a direct multiple of 24 frames i.e. at 48Hz or 72Hz (to display at 24 frames is not done for just the same reason it isn't at the cinema, the picture would appear slow and stuttery). This means that instead of 3:2 sequencing the signal to 60Hz, the display will actually 2:2 or 3:3 the sequence instead. Of course those of you with a video processor have been doing this for years anyway!!
|
|||||||||||||||||||||
the video mode deinterlacing process |
|||||||||||||||||||||
Video mode deinterlacing is employed when the material was originally shot by a video camera as an interlaced signal. The studio camera used to record the shot shoots at 50Hz (PAL), but not as 50 progressive frames. The odd-numbered fields only containing the odd-numbered line information from the lens, and the even-numbered fields the even lines. This is in contrast to film mode deinterlacing where the two half-fields were created from an original full frame, and then can be easily combined back to the 25 frame per second sequence. This represents an immediate problem – the even lines are recorded 1/50 th of a second (or 1/60 th for 60Hz NTSC) after the odd lines were taken. Objects that are moving within the image will not be in the same place in the odd frame as they were in the even one. A simple combining technique (as with film mode processing - top set of circles) cannot be replicated without introducing “combing”, which are the line errors introduced when two non-matching fields are simply overlaid onto each other but in actual fact some slight movement has taken place (bottom set of circles).
With video mode deinterlacing, the processor must compare the two fields of information, recorded 1/50 th second apart, and make its best interpolation of the missing information from the data available to create progressive frames for display. There are more and more complex ways of doing this to create the most accurate rendition of the image. Non-Motion AdaptiveThis is the simplest technique. When the processor is presented two fields of information, it simply discards the second field and creates the entire progressive frame based on the information from the odd-numbered lines. Each even line is an algorithmic average of the odd line of pixels above and below, this newly created frame is then displayed twice in place of the two interlaced ones. Since half the information is discarded, the resolution of the image is effectively halved. There are no combing artefacts as with the rudimentary overlaying moving fields atop each other as above, but half the detail is ignored in the process so the final result is not as accurate as the original material. This is most evident away from blocks of solid colour, and in the all-important more detailed area of the image!
Many video processing chips in displays themselves often employ a slightly better technique than this with standard resolution signals. But with 1080i interlaced HD format, where the computational power required to resolve all 1080 lines (and 1,920 pixels per line) is larger, processors in quite a few displays often resort to just analysing 540 lines i.e. it uses this non-motion adaptive deinterlacing technique. Motion Adaptive deinterlacingMotion adaptive deinterlacing is the more populat technique of choice and involves an analysis step before carrying out any calculations. This step ascertains first if there has been any motion between the two frames of information. If no information appears to have changed between each frame, then the process will combine both fields completely just as it would have done for film. The full line resolution is used to create a progressive frame of the same detail as the original and there is no loss. However, where motion has been detected, the processor “adapts” its approach and employs an interpolative technique as described above. Commonly this means averaging line information using just half of the resolution of the original frame and interpolating the rest. This is known as field based motion adaptive deinterlacing since the whole field is studied for movement, and then weave or bob technique is employed. This is more efficient than non-motion adaptive, but generally there will always be motion so for moving material the technique produces results only a little better than without motion adaptive detection. More advanced alogorithms use more precise analysis of the data. The next level up is to analyse just regions of the fields rather than the whole field itself. If an object within the field has moved, but large areas of the background have not then these regions can be weaved at full resolution, leaving just the region where movement has taken place to require interpolation. The more commonly known advanced technique being “per-pixel motion adaptive deinterlacing”. In this case the two fields are analysed right down to a pixel-by-pixel level as to whether motion has taken place between them. This is obviously better than basic field-based or region-based motion adaptive deinterlacing since if there is movement, only in that precise area that movement occurred does the processor resort to interpolation. Where motion has occurred, just that area of the image is interpolated. While some detail is bound to be lost in this area of motion, there is more chance of the image being better detailed overall. The results are very true to the original when this is done well.
- within the solid area of colour, the pixels are green in both fields and so the processor can simply combine them immediately Only the more powerful and better-implemented processors will employ per-pixel motion adaptive deinterlacing on standard definition signals. When it comes to using per-pixel motion adaptive deinterlacing on HD 1080i signals, only those high powered processors which use chipsets such as Gennum's VXP or Silicon Optic's HQV enjoy this luxury (Lumagen wrote their own per-pixel technique for 1080i video deinterlacing for their VisionHDP and VisionHDQ processors). Just about all other processors will revert back to the field based motion adaptive approach for 1080i. Other advancements to motion-adaptive deinterlacing include increasing the amount of movement analysis to more than just the next field in line. Large buffering and processing architecture is required for this, but it does allow the processor to make an even better informed judgement of movement in the whole scene and thus rule out on-off errors and the like. Diagonal Interpretation of the deinterlaced signalThe more advanced processors all employ a diagonal interpolation step after deinterlacing the two fields together. This filter is employed to the newly deinterlaced frame, to remove artefacts such as “jaggies” created in the areas of the picture where interpolation has had to take place on curved edges which have been made to looked stepped along their edge. By analysing the pixels in various diagonal lines, the processor can pick up on different shapes and angles that otherwise may not be picked up by just comparing above and below. This is most useful when the signal is also being upscaled and again the jagged edges can be even further exagerrated. This is most commonly used to smooth out curves in an image. While many processors have this skill, implementations vary and some do come out better than others.
|
|||||||||||||||||||||
the scaling process |
|||||||||||||||||||||
As we have seen, the video processor in your system can be presented with a multitude of resolutions. Most signals in the UK will be a PAL 720 x 576 resolution, or high definition 1280 x 720 and 1920 x 1080. The problem here is most displays are made up of a different configuration of pixels to this. The processor must increase (upscale) or decrease (downscale) the number of pixels in the image in order to fit the different resolution incoming signals to the fixed resolution display. This is not as simple as it might sound. Making an object exactly 2 pixels high by 2 pixels wide, fit a grid of pixels 3 high and 3 wide simply doesn't work. Imagine YouTube for example, as you make the video window bigger the picture gets softer and worse the bigger it is. The same for enlarging a photo on your computer. So to do this more analysis must take place to ensure that when pixels are added (or removed) the final result looks the same as it did to start with. Complex mathematics (that I barely understand let alone could explain) analyse each pixel, and then a number of pixels surrounding it. Each pixel, and it's relationship to the surrounding pixels, is given a weighting relative to the importance of that pixel within the image. This is analysed in both the original pixel resolution, and also the target location for that pixel in the output resolution. The surrounding pixels for each individual pixel overlap one another, so when expanded to the new output resolution the scaler decides which original pixel the surrounding ones was most highly weighted to, and then applies the colour detail from said pixel. Simple eh?
As the image above shows, just scaling 2x2 to 3x3 cannot be done without losing something (usually sharpness). Thankfully to a bigger scale it is not such a bad representation, but the theory is true for all images. In the 3x3 box there are essentially four overlapping 4-pixel blocks, each 4-pixel block equating to a single pixel in the original. This would be known as 4-tap scaling, since each pixel analyses four taps around it. Basic processing might use such a simple 4-tap approach, but most of today's advanced processors would use 16 or more taps (the HQV chip uses 1,024!!!!), So taking the top middle pixel in our example; the top-left pixel of our original would dictate that the target pixel should be black however the top-right pixel of the original would want it to be green! The scaler chooses an average, depending on whether it has weighted the green pixel as more or less important than the black one, which depending on how many taps are involved would have a varying success. As with deinterlacing, the more advanced scaler would go on to use such techniques as motion adaptive analysis (studying for movement), or temporal analysis (studying the same pixel over different periods of time) to improve the accuracy of the weightings employed. |
|||||||||||||||||||||
other features a video processor might have |
|||||||||||||||||||||
Noise reductionAsides from deinterlacing and scaling, and doing these processes well, there is still more to generating the ultimate picture. Noise (or “grain”) is something that will plague all images and can be picked up through broadcast, cabling, disc error, pick up, or even in the recording studio or on the set. The simplest filtering used (and used quite commonly) is to simply filter out every instance of a single, seemingly random pixel (or pair of pixels). This is based on the theory that information in the picture is not likely to be just one pixel in size, and son a one-pixel element must be unwanted noise. Unfortunately this technique often removes detail too, so the image while apparently free from unwanted noise, is also quite “soft”. Skin tones may not look right where tiny blemishes have simply been taken out of the image. This is a process under the umbrella of "spatial filtering” - where the noise reduction takes place over space i.e. over a single 2D frame A “temporal filter” might analyse a questionable pixel in the various fields before or after it in time. Should the suspected noise element only appear in one frame then it might be noise and will be filtered out, if it appears in several other frames before and after then it is more likely to be detail and should be kept. This has very good results and is a technique often only found in the better after-market processors and higher end displays (better processors offering smaller areas of analysis and/or more buffering for temporal study). However that pixel of noise might also be a part of a moving object within the image. As with deinterlacing, the ideal would be to carry out a (per-pixel) motion adaptive temporal filter to analyse those pixels for motion, as well as existence over time. The deepest analysis (and of higher resolution images) involves massive processing power, so a full per-pixel, motion-adaptive, noise-adaptive, temporal filter is reserved for only the highest power processors. Especially for HD signals. Detail EnhancementDetail enhancement is another area of alleged improvement. Initially all digitised signals are filtered to prevent inaccuracies such as false colour or moiré effects. This is known as anti-aliasing which has the unfortunate effect of having to blur some detail. To combat this, edge-enhancement might be employed to artifically sharpen these blurred areas. It works on the theory that the human brain perceives sharpness as the contrast between dark and light. Surrounding an object with a white edge therefore makes it appear sharper to the eye, however done without any grace and it will simply look like an object with a white line around it!! This artefact is known as “ringing” or “haloing” and is a common side-effect of poor edge-enhancement. For low quality sources this can be a benefit to help the viewing experience (e.g. with SD TV broadcasts), but on good material this can invariably only be a hindrance. Unfortunately this might already be inherent in your signal before it gets to your home since edge-enhancement is sometimes applied in the studio… A better processor will use an analysis step to identify the more blurred areas of the image and then only apply edge-enhancement here, perhaps more conservatively than the brute-force technique a simpler processor might use. When combined with the analytical steps in the scaling process the results will be true to the original. Gamma and Colour adjustmentMany processors will include adjustment for the RGB Gamma and overall Luma of the image. This enables the system calibrator to make precise adjustment to white accuracy at different grayscales leading to perfect gamma tracking at all luminance levels of the image. The image is colour accurate in dark and light areas of the image, and transitions between brightness levels are smooth and accurate. A processor might also include a colour management system (CMS). This allows a calibrator to manage the quality of colour i.e. to ensure red is red and not orange or pink. This is already touched on by adjusting saturation and hue controls, and by setting accurate white points with gamma correction controls. But the final stage is to perfect exactly where the processor maps each of the primary (and in more advanced CMSs the secondary) colours for the ultimate in life-like imaging. Functional and Operating ToolsAs you may have used with your existing TV, there will be a selection of different memories or modes to select depending on what you are displaying. You might select a “game” mode when using game consoles which have a duller output than your other video sources. A more advanced processor will allow the various adjustments and settings for colour and grayscales, edge-enhancement, noise reduction, ultimate output resolution even, to be configured differently for each individual input. You might have multiple output memories for both day and night use where differing ambient light levels will require different grayscale characteristics. Or more commonly you might have a projector and a plasma screen being driven by your processor, so for all the various inputs these would require a double set of output settings, each tailored to either the projector or plasma screen.
|
|||||||||||||||||||||