I have a special use case where I have security cameras that have terrible “movement” recognition so we’d get a lot of useless recordings when the bushes shook, and then things like the mailman wouldn’t register.
I knew I wanted to implement Object Recognition and YoloV5 seemed to fit the problem. I’d encourage everyone to take literally 30 seconds to look at the YoloV5 link as to analyze an image takes like 4 lines of code. The barrier to entry is so low.
As well, I am also not a Python developer so I’m sorry about the shit code.
The Idea
I don’t need a per-frame analysis, and through my experience with FFMPEG, I know it can output frame grabs. Let’s analyze for cars and humans twice a second (every 0.5 seconds). So we just need to stream images, without persisting to disk (because that wouldn’t be efficient) and pipe it into YoloV5.
Streaming Images via FFMPEG
import subprocess
from subprocess import Popen
# Pipes png bytes to STDOUT
ffmpeg_command = ["ffmpeg", "-rtsp_transport", "tcp", "-i", "url",
"-f", "image2pipe", "-c:v", "png", "-vf", "fps=2", "pipe:1"]
pipe = Popen(ffmpeg_command,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
bufsize=10**8)
The main thing here is that we’re taking a RTSP input (via tcp) and outputting PNG bytes to STDOUT at 2 fps.
Then we need to read the buffer, and find the decimal values 137 80 78 71 13 10 26 10 to signify the start of a PNG, load all of those bytes until the next magic number that signifies the next one.
start = True
x = 0
currentPngBytes = bytes([])
while(pipe.poll() == None):
readBytes = pipe.stdout.read(10**4)
foundStartMarker = False
x+=1
for index, item in enumerate(readBytes):
if (index + 8 > len(readBytes)):
break
if item == 137 and readBytes[index+1] == 80 and readBytes[index+2] == 78 and readBytes[index+3] == 71 and readBytes[index+4] == 13 and readBytes[index+5] == 10 and readBytes[index+6] == 26 and readBytes[index+7] == 10 :
foundStartMarker = True
if start == True:
start = False
currentPngBytes = bytes([])
else:
currentPngBytes += readBytes[:index-1]
analyzePng(currentPngBytes)
currentPngBytes = bytes([])
currentPngBytes += readBytes[index:]
break;
if foundStartMarker == False:
currentPngBytes += readBytes
print("Killing Pipeline")
pipe.kill()
Analyzing the raw image bytes
I load the bytes into a PIL image, crop out the part I don’t want to analyze, set the classifications I care about, then analyze.
To get the classifications of the default COCO data set, use this (but subtract 1 from the id): https://gist.github.com/AruniRC/7b3dadd004da04c80198557db5da4bda
def analyzePng(pngBytes):
img = Image.open(io.BytesIO(pngBytes))
# crop the top of the image because I don't want to analyze that section
croppedImg = img.crop((0,50,img.width, img.height))
# COCO classes I care about
model.classes = [0,2,3,7]
results = model(croppedImg)
results.print()
Summary
In the end, I was extremely surprised how easy it was to do object detection via YoloV5. It really lowers the barrier to entry for object detection. I’m still working on result manipulation, but this will link into the security code very easily.