This is the first "chapter" of what will be a small series about computer vision—a very extensive area of technological knowledge that has caught my attention. I do not plan to follow a content script, so I will decide what to publish as time goes on. I am dedicating a lot of effort and care to studying it, so I hope that, as a reader, you enjoy it, share it, and, above all, learn something new each time.
Figure 1. Real-time social distancing monitoring in COVID times Source: deepnote.com |
An introduction to computer vision
In this first introduction, the aim is to define the basic concepts of computer vision, thus creating a solid foundation upon which we can build for future, more complex publications. What is computer vision? What does it entail? What are its applications in the real world?
1. Concept and contextualization
Like almost every technology, computer vision relies on knowledge from many other fields such as mathematics, physics, computing, electronics, robotics, and others, depending on the problem to be solved.
Computer vision is the effort of computer technologies to process, analyze, and interpret images. In a way, it aims to emulate human visual understanding.
At the outset, a machine can not inherently understand anything about an image because it lacks self-awareness. When we look at the scenery outside our window, we can accurately distinguish the different elements that make up the scene. We recognize a tree, a vehicle, or a cat, even without fully seeing the shape of one of them.
Imagine it is summer, the sun is shining brightly, and you see a thin, elongated, furry object moving gently from side to side. It seems to be part of something larger and is behind a car wheel. So, is it... a cat? A dog? A snake? A rope? We would know it is a cat from the way its tail moves and because cats often hide under cars when it is hot, among other clues.
This is because we can recognize and deduce, to understand what something is and is not. We learn throughout our lives and are aware of our surroundings and our place within them. By nature, we are intelligent enough to make relatively complex deductions.
Figure 2. Birds detection - Source: myrobotlab.com |
To understand how a computer vision program can infer knowledge automatically from an image, we first need to clarify that it does not work directly with the image itself. The smallest indivisible unit of information it operates on is the pixel. In the RGB model, a pixel represents one of 16,777,216 possible colors.
By analyzing the set of pixels that make up an image and using various computational algorithms, the machine can identify different shapes across the two-dimensional space of the image. These algorithms do not work independently; rather, they are tightly synchronized and form part of a much larger information-processing framework. Due to its complexity, I will leave this point as a "black box" for future publications.
Computer vision is computationally intensive, as all the stages through which information passes—from image capture to interpretation—typically occur in real-time. This temporal demand highlights the challenge of algorithmic efficiency and processing speed.
2. Application areas
Many studies suggest that computer vision is currently booming due to the vast number of applications it has. The volume of visual content generated in recent decades has fueled the emergence of new applications and, consequently, the development of newer and better algorithms. However, computer vision is still in the early stages of its development.
Applying computer vision algorithms to any specific area or problem is inherently a complex task. These algorithms must be custom-designed to meet high standards of precision, robustness, and speed. Some applications in various fields include:
- Agriculture: automatic control and detection of weeds and pests; identification of plant species.
- Astronomy: observation of cosmic elements (stars, planets, satellites, etc.).
- Meteorology: tracking and prevention of meteorological phenomena (rain, snow, hurricanes, wind, etc.) using satellite imagery.
- Biology: determination of certain traits in plant or animal species based on their texture, color patterns, size, shape, etc.
Figure 3. Interaction between living beings |
- Medicine: pathology diagnose using ultrasounds, X-rays, mammographies, MRIs, etc.
- Microscopy: reconocimiento y conteo de células o microorganismos a partir de imágenes microscópicas tomadas de una muestra.
Figure 4. Cancerous cells detection- Source: firstxw.com |
- Geology: ground movements (landslides, rockfalls, etc.), geological formations or measurements.
- Quality control and inspection: monitoring of quality in fruit and vegetable manufacturing (impurities, poor appearance, production defects, etc.).
Figure 5. Flaw detection on fruits- Source: interempresas.net |
- 3D modeling and visualization: 3D modelling from bidimensional images.
- Recognition and classification: facial, vehicle, etc., recognition. This area tends to be of inspiration for many others; such as facial recognition applied to device authentication.
- Robotics: automatic machine guidance. Autonomous vehicle driving, and automation of exploration robots on other planets, for example.
- Security: movement, strange bodies, facial, fingerprint, eye, etc., recognition.
- Automotive: parking assistance, visualization of a vehicle´s position in the space, etc. Could be included within the robotics category.
- Accessibility: user interface development based on eye or gesture detection for people with motor limitations.
Figure 7. Blink detection- Source: pyimagesearch.com |
- Teledetection: detection of various phenomena from either aerial or satellite imagery (deforestation, tracking animal species migration, wildfires, etc).
3. References
Ben G. Weinstein. (2017). A computer vision for animal ecology. (British Ecological Society). Retrieved from: https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2656.12780 on July 2021.
Iván García S., Víctor Caranqui S. (2015). La visión artificial y los campos de aplicación (Universidad Politécnica Estatal del Carchi – Ecuador). Retrieved on July 2021.
Mouna Afif, Yahia Said, Mohamed Atri. (2020). Computer vision algorithms acceleration using graphic processors NVIDIA CUDA. (Springer). Retreived on July 2021.
Kari Pulli, Anatoly Baksheev, Kirill Kornyakov, Victor Eruhimov. (2012). Real-Time Computer Vision with OpenCV. (Communications of the ACM). Retreived from: https://cacm.acm.org/magazines/2012/6/149789-real-time-computer-vision-with-opencv/fulltext on June 2021.
This post is also available in Spanish at "Visión artificial #1 | Breve introducción a la visión artificial".
No hay comentarios:
Publicar un comentario