Skip to main content

Applying Modern Machine Vision Technologies to Security

Video surveillence has proven itself to be an advanced sensor with benefits. Serving as a remote set of eyes, video surveillance allows a virtual presence in off-site locations from a single point. What's more, video cameras cover a large contiguous swath of view, allowing a panning camera to steadily and consistently sweep a search pattern. 

Video systems can also function in locations where humans cannot. The earliest known video surveillance technology was used to safely monitor the development and launch of V-2 rockets in 1942. From a safe distance, scientists and engineers could observe performances and identify failures.

Since then, video systems have acted as an extension of our eyes and ears. A steady stream of technology developments and manufacturing advancements have taken video surveillance to such a level that we feel comfortable relying on it for security purposes.

First Light

Light-sensitive materials can change their resistance or conductance based on the presence or absence of light. Early monochrome video systems like the RCA Vidicon camera system of the 1950s involved vacuum tubes featuring a light-sensitive selenium plate that acted as the focus of the image to be sensed.

An electron beam would scan the plate and the resulting current was directly proportional to the amount of light hitting that section of the plate at that exact time. Thus, the raster-scanned tube produced a rudimentary electronic video signal that could easily be transmitted long distances. The CRT television took this signal in reverse order, scanning a phosphor screen with the electron beam to re-create corresponding light levels in the image.

For decades, video was limited to monochrome sensing and displaying of images in real time. Color filters in front of each sensor limited the analog level to the intensity of the constituent colors in order to create color sensors. Color phosphors placed in the path of the electron beam were used to create colors. The advent of colorburst crystals helped to synchronize color components in the video signals.

Steady advances made improvements in these tubes over time, including better resolutions, lower power, lower cost manufacturing, and higher reliabilities. The Closed Circuit Television (CCTV) and broadcast industries were born and driving development at an even quicker pace.

On the down side, these technologies used fragile glass and the circuitry needed used higher voltages. Size constraints made tube-based image sensors a large and bulky assembly. Thanks to modern semiconductor technology, this is no longer the case.

Solid-State Sensors

Charge-Coupled Devices (CCDs) entered the scene in the early 1970s, combining semiconductor manufacturing with highly disciplined arrays refined through the use of memory devices. Individual light-sensitive picture-element sensors in the array synchronously set the state of a flip-flop. In turn, these sensors connected in a daisy-chain architecture coupled like a shift register. Clocking the shift register generates a synchronous video stream.

Initially used as one-dimensional sensor arrays for applications like scanners and fax machines, two-dimensional and eventually color versions of CCDs emerged, allowing video image sensors to dramatically shrink in size, while also simplifying power requirements.

Humans Need Not Observe

Because of the lack of availability of recording technologies, early CCTV systems needed human observers who had only one chance to extract as much information from a detected event as possible. After that, the images were lost forever.

In CCTV systems like this, it is the human who recognizes patterns, detects activities of interest, and makes the decision to create an alert or not. In essence, a human is the control processor in an alarm loop who makes the one-bit decision to trip an alarm machine or not.

Linear CCD sensors began to change this with the programmed ability to read bar codes and recognize patterns. Two-dimensional sensors used in modern phones, cameras, and machine-vision systems have extended resolutions and spectral sensitivities while reducing size, power, and the need for external lens assemblies.

Machine vision is joining with artificial intelligence to spawn a new generation of capable surveillance systems that require fewer personnel, less costs, and higher levels of programmed detections (and tracking) of targets. These requirements raise the bar for design engineers, who now need to integrate functions at a higher level with more processing power than ever before.

Design Issues and Concerns

Without the speeds and densities of modern memory devices, and without the horsepower of modern embedded processors, the next generation of smart surveillance systems could not be designed at reasonable costs and sizes. One main reason is the burden that every increase in image resolution imposes on the rest of the system design.

Older 8-bit 4-MHz legacy processors were fine for helping designers pioneer digital control loops, as well as digital techniques for signal processing and real time control, but are just not fast enough to tackle the needs of smart security. This basically boils down to the exponential growth of memory requirements.

For example, a simple legacy composite video camera had 525 total scan lines that could be sampled at various rates for each line. Twenty-one of those lines were used for vertical blanking. Modern CCD image sensors start at digital resolutions of ¼ VGA (320 x 240) at the low end.

At ¼ VGA resolutions, 76,800 bytes are needed to represent a single frame (at 8-bit resolution). With 8-bit RGB (one byte for red, green, and blue), this is up to 230,400 bytes. In both cases, this is beyond the addressable range of legacy processors.

Memory requirements rise dramatically as resolutions increase. Even at a VGA resolution of 640 x 480, a single monochrome frame needs 307,200 bytes, and color needs almost 1 Mbyte per single frame at 24-bit color palettes.

It doesn't stop here. At 30 frames per second, typical flicker fusion rates, almost 28 Mbytes are needed to buffer a single second of VGA video. A comparison of a few common video standard resolutions highlight this increasingly demanding constraint.

 

Horizontal Resolution Vertical Resolution Picture Elements Single-Frame 24-bit 1 Second Buffer Standard
320 240 76,800 230,400 6,912,000 1/4 VGA
640 480 307,200 921,600 27,648,000 VGA
800 600 480,000 1,440,000 43,200,000 SVGA
1024 768 786,432 2,359,296 70,778,880 XVGA
1280 768 983,040 2,949,120 88,473,600 WXGA
1280 1024 1,310,720 3,932,160 117,964,800 SXGA
1400 1050 1,470,000 4,410,000 132,300,000 SXGA+
2048 1536 3,145,728 9,437,184 283,115,520 QXGA
3200 1800 5,760,000 17,280,000 518,400,000 WQXTA+
4096 3072 12,582,912 37,748,736 1,132,462,080 HXGA
7680 4800 36,864,000 110,592,000 3,317,760,000 WHUXGA

Table 1:The increased resolutions impose dramatic memory requirements for video intensive processing, storage, and transmissions.

Blazing Processors

Several advanced processor families and architectures have evolved alongside video technologies and are now ready to step up to the plate and handle the next generations of smart surveillence and video designs. In all cases, external bus interfaces and most likely high-speed external DRAM will be used. Most advanced processors can handle several gigabytes of memory addressing and support several synchronous high-speed memory interfaces like DDR and SDR. Designers need to keep memory bandwidths in mind when architecting a system

Applying Modern Machine Vision Technologies to Security Figure 1

Figure 1:Transfer speeds and access times become increasingly more important as resolutions increase. This translates into faster processors and memory sub-systems to capture and buffer image data until it can be transported to a hub or aggregator.

Two main applications exist in which different requirements drive processor selection. The design of a central hub and/or aggregator necessitates very high-end or even multi-core processors, DVR functionality, and very deep pools of both volatile and non-volatile memory. However, a different set of constraints exist inside the actual cameras, including lower power, extended temperature ranges, and smaller sizes.

Running at 48 MHz with single-cycle instructions, the 32-bit ARM architecture in STMicroelectronics' STM32F051K4U6TR Cortex M0 processor has the data path and bus widths to handle entire picture element samples in a single transfer. It also runs down to 1.8 volts with a -40 to +85 degree operational range in a 5mm x 5mm package.

This processor has hardware direct memory access (DMA) that can handle five channels for memory-to-memory and memory-to-peripheral transfers and the built-in HDMI controller interface operates at lower speeds with minimal memory. A special clock domain for video is independent of the processor's main clock.

When the processor is coupled with modern high-speed and dense-memory devices like the ISSI DDR3 IS43TR16256AL-15HBLI, this device provides a single 4 Gbit (256 Meg x 16 bit) 1,333 MHz memory bandwidth solution. Keep in mind, it is not only a processor that needs to access and manipulate memory. The communications controller is an integrated part of this equation as well.

Transport and Linkage

Communications requirements can be a major part of next-generation surveillance system design challenges. With so much data, transfer speeds need to rise exponentially, as did memory requirements. The distance factor becomes an issue too. Even popular point-to-point 100 Mbit/sec Ethernet connections have limitations when driving long distances over CAT-style cabling.

Applying Modern Machine Vision Technologies to Security Figure 2

Figure 2:Even at lower video resolutions, data transfer speeds can degrade rapidly over distances using established low-cost media like CAT-style twisted pair cabling.

This also means that shorter-distance data links like DVI and HDMI are not usable over these distances. The same holds true with many of the modern multimedia interconnections standards like S/PDIF and Toslink. While S/PDIF is digital and can be extended through the use of many standard driver and receiver chips, required data bandwidths are only going up, translating to a need for more elaborate and expensive driver technologies going forward.

Cameras with gigapixel resolutions are already available and in use. With the security and accountability concerns in modern society, people can begin to expect higher and higher resolutions to keep an eye on what they are doing. While fiber-optic links support higher bandwidths, copper has typically been cheaper to deploy. As a result, copper-based link technology is being developed and standardized for next-generation remote security systems.

One interesting technology that is rising to meet the bandwidth limitation challenge is the CoaXPress standard. CoaXPress is a coaxial cable-based standard for high-speed (6.25 Gbits/sec) point-to-point communications links well up to 130 meters. Multiple channels can up the data rate to 25 Gbits/sec.

Chipsets like the Microchip EQCO62X20 (EQCO62R20.3=Receiver, EQCO62T20.3 = Transmitter) can be used to form a bi-directional full-duplex communications channel over a single coaxial cable. Using external inductors, power can also be transferred over the same cable. Low power (<70 mWatt at 1.2 Volt) is desirable here too as is the small 4mm fine pitch-QFN packaging.

Applying Modern Machine Vision Technologies to Security Figure 3

Figure 3:Microchip Technology's EQCO62X20 chipset is compatible with the CoaXPress v1.0 Camera standard.

What About Wireless?

While wireless connectivity is possible, distance and bandwidth limitations make this unfeasible for most surveillance applications. In addition, wireless links are more easily jammed, making security applications vulnerable to attacks.

It may be possible to use short-hop wireless links to local aggregators and hubs. This is especially of interest if cloud-based connectivity is desired. It's not too far-fetched a notion that all civic cameras will become a part of the Internet of Things (IoT) in the very near future. This would mean that everyone would have access to all public surveillance devices to level the playing field and assure that no abuse of power or brutality will go unnoticed.

Conclusions

Security and surveillance are increasingly a part of our everyday lives. We are photographed, taped, and monitored in nearly everything we do. The sheer number of cameras and video feeds has driven the need for security personnel to sit and watch and make determinations.

Today's hardware systems cameras and video aggregators are and will continue to use higher-level processing and even AI to automate the data gathering and security assessments.

Smile, the camera focused on you is not so candid!

 

 

About the Author

After completing his studies in electrical engineering, Jon Gabay has worked with defense, commercial, industrial, consumer, energy, and medical companies as a design engineer, firmware coder, system designer, research scientist, and product developer. As an alternative energy researcher and inventor, he has been involved with automation technology since he founded and ran Dedicated Devices Corp. up until 2004. Since then, he has been doing research and development, writing articles, and developing technologies for next-generation engineers and students.

Profile Photo of Jon Gabay