My thesis Journey — Part 3

4 min readSep 20, 2023

It’s been almost 9 months ago that I started working on my thesis for fall detection via computer vision.

I have a lot of new things to share, but I realized that I haven’t shared too much information about how to try to replicate what I’m doing so let’s change that now.

The datasets

The paper I’m using to compare my end results is called “Vision based human fall detection with Siamese convolutional neural networks” you can access the the paper here, from now on I’ll refer to this paper as “the work done by berlin, et al.”

The datasets used here are from the University of Rzeszow Fall Dataset, you can download everything here. They also use a second dataset call Fall Dataset. I’ll explain the process done for of the datasets since it’s very similar, the only thing that is different is how the second dataset labels their data but if you know your way around python and OpenCV or PIL you’ll be fine. The URFD dataset keeps the label information in a .csv file and it took me some doing to accurately label each frame. But before building and setting up the data we need to preprocess the images to compute the optical flow.

Optical flow

I you are not familiar with what optical flow is you only need to know this oversimplification to understand. You take two images that are usually from adjacent frames from a video to segment moving items, as shown in the image below.

work of betlin et al on spiking neural networks

Optical flow has mathematical methods of computation how ever they can be complex and lengthy so usually in computer vision we use AI models to estimate the flow map. In the work done by berlin they use the PWC-Net model developed by NVIDIA in 2018 their github code is pretty old and unattended. My advice is to avoid using this if possible, but if you are in a situation like mine in which you need to replicate the results done by a paper follow these steps for the PyTorch implementation (keep in mind that the official implementation is done with Caffe, which I haven’t been able to install)

Use a VM with ubuntu 16.04
Install version 5 of the gcc compiler (gcc-5)
Now you can begin to read the instructions from the github for the pytorch installation
Make sure you install cudatoolkits that are compatible with python 2.7 and the supported version of pytorch for this repo

If any errors occur, you’ll be able to fix them searching on forums, I only state these steps specifically because no one told me it would work on a older version of ubuntu, I figured it on my own after weeks of failed installations.

You might be able to make things work with the PyTorch implementation but I found a github that uses libraries that are a bit more newer and also here the PWC-Net is well implemented for PyTorch unlike the first one. For this one just install the compatible versions specified on a conda environment in ubuntu 18.04 and should be fine.

Now that installation is complete, we need to rearrange the code in the run.py file or any main python file depending on the installation you went for. The code that computes the optical flow need to be formatted into a function like

def PWCNetCompute(frame1,frame2, outputfile)

by default these implementations write to a folder or file so the only thing you need to provide is a file path. If you can automate reading the file paths of the images and writing the output paths that most of the work is done. I don’t think you’ll have to much trouble write the code as much as reding mine but, if you want my code you’ll have to wait a bit untill I can coment and organize everything.

The proposed model

I showed my proposed model on the last blog if you want to see it, It’s basically a Siamese Neural Network that uses transformer blocks for feature extraction rather than CNN layers. I’ll just provide an update on the results of the model and I’ll write a different blog for the PyTorch code of the actual models for the original berlin paper and my proposed model.

SNN with CNN layers

SNN with transformer encoders

I’m still playing with the configurations and tuning the right parameters for both implementations It’s not looking too good but to me this is big because I spent like 5 weeks trying to get the recall or accuracy get past 0.020 or 2% there is a lot of information that the paper doesn’t share so I need to investigate a lot of topics that I know very little of. But I’m still very exited considering I knew nothing about build AI models 9 months ago, still have longs ways to to but I just need to keep going.

Thanks for reading :)