import { Box, Heading, Kbd, Text, Image } from '@chakra-ui/react';
import React, { useEffect } from 'react';
import MathJaxTex from '../../components/MathJaxTex';
import { useLocation } from 'react-router-dom';

const FeelThePixels: React.FC = () => {
	const { pathname } = useLocation();

	useEffect(() => {
		window.scrollTo(0, 0);
	}, [pathname]);

	return (
		<Box p={5}>
			<Heading>Feel the Pixels</Heading>
			<Text>
				Computer Vision(CV), as the name suggests, is really about
				computers trying to develop vision, not literally, but in a
				programmatic way. Humans capture the world through their eyes
				and understand, explain and reason about the surroundings.
				Camera is the eye of a computer. Computer vision is all about
				trying to make sense out of image & video captured with a
				camera. Some examples are: mobile cameras predicting your age,
				an app applying cool filters & effects to images, software
				analyzing complex retina scans to predict diabetic retinopathy
				or software written to analyze CT scans to identify brain
				hemorrhage, etc. When I say software it necessarily doesn’t mean
				just computer program/app, it could even be an AI (Artificial
				Intelligence) model. Solving computer vision problems, either
				using traditional or AI based approaches, requires deep
				understanding of math, signal processing, and obviously
				programming. In the past two years, I have interviewed many
				students, even those with masters degree from top colleges.
				Often, I have seen people directly deep dive into the
				programming part, copying code from stackoverflow/chatgpt/medium
				articles. Sometimes it works and sometimes it doesn’t. In either
				case, they couldn’t explain why something is, the way it is. At
				a high level, we could infer that they lack mathematical skills
				required to solve the problem. In some cases, even if people
				know the math or after explaining the math, they really struggle
				to explain the questions. From what I have seen, the underlying
				problem is, people really don’t really understand image
				representation or don't have the right intuitions for an image.
				In this short blog, we’ll look into really understanding what
				image is a.k.a feeling the pixels.
			</Text>
			<Image
				width="250px"
				src="https://pub-17b7496e137e40fcbe7057d6a4735482.r2.dev/feel-the-pixels-article:1.png"
				alt="feel-the-image1"
			/>
			<Text fontSize="2xs" as="i" color="gray.500">
				Image source: generated with stable diffusion model
				<br />
				Prompt: A human being trying to see, visualize and feel a pixel
				in an abstract image
			</Text>
			<Text mt={5}>
				In simple terms, a pixel is the smallest unit of an image and
				any given pixel has a color value associated with it. A color,
				in the context of a colored image will be a combination of red
				r, blue b, and green g and they are quantified to have any
				integer value between 0-255. Any other color could be derived
				using these three base colors. For example,{' '}
				<Kbd color="blackAlpha.800">(r, g, b)</Kbd> values for white
				would be <Kbd color="blackAlpha.800">(255, 255, 255)</Kbd>, for
				black <Kbd color="blackAlpha.800">(0, 0, 0)</Kbd>, for red{' '}
				<Kbd color="blackAlpha.800">(255, 0, 0)</Kbd>, etc. In the below
				colored image, every pixel in the highlighted area (see red
				circle) has color value of{' '}
				<Kbd color="blackAlpha.800">(153, 184, 198)</Kbd>.{' '}
				<Image
					mt={2}
					src="https://pub-17b7496e137e40fcbe7057d6a4735482.r2.dev/feel-the-pixels-article:2.png"
					alt="feel-the-image2"
				/>
				<Text fontSize="2xs" as="i" color="gray.500" mb={2}>
					Image source: generated with stable diffusion model
				</Text>
				<br />
				For non-colored images or grayscale images (see figure below),
				color would be just represented with one integer value between{' '}
				<Kbd color="blackAlpha.800">0-255</Kbd> (ex: black: 0 and white:
				255).
				<Image
					mt={2}
					src="https://pub-17b7496e137e40fcbe7057d6a4735482.r2.dev/feel-the-pixels-article:3.png"
					alt="feel-the-image3"
				/>
				<Text fontSize="2xs" as="i" color="gray.500" mb={2}>
					Image source: generated with stable diffusion model
				</Text>
				<br />
				So, the first thing that should come to mind when we think about
				a pixel is that its value is equal to color value represented as{' '}
				<Kbd color="blackAlpha.800">(r, g, b)</Kbd> or grayscale with
				lower bound being 0 and upper bound being 255.
			</Text>
			<Text mt={5}>
				An image is made up of several pixels depending on image
				resolution. Image resolution is nothing but{' '}
				<Kbd color="blackAlpha.800">
					number of pixel rows x number of pixel columns
				</Kbd>
				. Resolution of the below image is{' '}
				<Kbd color="blackAlpha.800">256 x 256</Kbd> as there are 256
				rows of pixels and 256 columns of pixels.{' '}
				<Image
					mt={2}
					src="https://pub-17b7496e137e40fcbe7057d6a4735482.r2.dev/feel-the-pixels-article:4.png"
					alt="feel-the-image4"
				/>{' '}
				<Text fontSize="2xs" as="i" color="gray.500" mb={2}>
					Image source: generated with stable diffusion model
				</Text>
				<br />
				In general, the number of pixel rows is the height of the image
				and the number of pixel columns is the width of the image. Apart
				from color, another important property associated with a pixel
				is its position in the image. The position of a pixel is given
				by <Kbd color="blackAlpha.800">(i, j)</Kbd>, where{' '}
				<Kbd color="blackAlpha.800">i</Kbd>
				is row and <Kbd color="blackAlpha.800">j</Kbd> is column where
				the pixel is situated. As said earlier, the image is made up of
				several pixels. Since each pixel has a position{' '}
				<Kbd color="blackAlpha.800">(i, j)</Kbd>, a value{' '}
				<Kbd color="blackAlpha.800">(r, g, b)</Kbd> or grayscale, the
				mathematical representation of an image is a matrix with{' '}
				<Kbd color="blackAlpha.800">m</Kbd>
				rows and <Kbd color="blackAlpha.800">n</Kbd> columns where{' '}
				<Kbd color="blackAlpha.800">m x n</Kbd> is resolution of the
				image. Next time you see an image, think of it as a matrix with
				dimension <Kbd color="blackAlpha.800">m x n</Kbd>, where each
				item is a pixel with a value of{' '}
				<Kbd color="blackAlpha.800">(r, g, b)</Kbd> or grayscale. Below
				matrix is for a colored image (see image; ignore gradient effect
				because of resizing) with resolution 5x3
				<MathJaxTex
					text="$
						\left[
						\begin{array}{ccc}
						  (0, 0, 0) & (0, 0, 0) & (0, 0, 0) \\
						  (0, 0, 0) & (0, 0, 0) & (0, 0, 0) \\
						  (0, 0, 0) & (0, 0, 0) & (0, 0, 0) \\
						  (0, 0, 0) & (0, 0, 0) & (0, 0, 0) \\
						  (255, 0, 0) & (255, 0, 0) & (255, 0, 0)\\
						\end{array}
						\right]
					  $"
				/>
				<Image
					mt={2}
					width="150px"
					height="250px"
					objectFit="fill"
					src="https://pub-17b7496e137e40fcbe7057d6a4735482.r2.dev/feel-the-pixels-article:5.png"
					alt="feel-the-image5"
				/>
				<Text fontSize="2xs" as="i" color="gray.500" mb={2}>
					Image source: generated using numpy; 5x3 is resized to
					250x150 for better visibility
				</Text>
			</Text>
			<Text mt={5}>
				With this, I hope to see people visualize an image clearly. In
				the next article, we’ll see how mathematical operations
				(filters) on a matrix (a.k.a image) yields magical results.
			</Text>
		</Box>
	);
};

export default FeelThePixels;
