Step 1: Image filtering

5 min readApr 24, 2024

I might change a few things in the future

In the last article I made an oversimplification of the methodology of the paper done by [1]. In this article I’ll go over the filtering stage.

The three filters

In section III.A the authors make mention of the methods for filtering depth images.

These three stages are not necessarily an invention of theirs, for their solution they use the Intel® RealSense™ Depth Camera D435i, and the manufacturer has presented a full article with the methods and how to use the integrated library of the depth camera. These stages include first a Non-Zero subsampling, a spatial filter to preserve edges and a time filter to ensure hole free image.

The authors also give an explanation of the methods and parameters used in table 2.

However all of this is done considering an Intel® RealSense™ Depth Camera, In my case (at least to start), I need to use the data from the URFD and the Fall detection dataset, and they both use a Kinect camera from x-box. In any case I’ll begin with how the authors have followed these instructions and later on I’ll see if any other filters or methods better suit my application, however I’ll maintain the three stages which are the smart subsampling, spatial filtering to maintain edges and temporal filtering to get rid of missing data.

Non Zero subsampling

subsampling is a well known method and usually does not merit an in depth explanation, however, the intel manufacturers recommend doing a smart version of subsampling .

The manufacturers explain that a smarter approach should be done, which consist of not only taking every nth pixel but actually taking the average of the 15 nearest neighbors excluding 0.

here’s the code i made in python for this

def nonZeroDownsampling(input):
    average = 0
    n = 0
    for i in input:
        for j in i:
            if j > 0:
                average += j
                n += 1
    if n == 0:
        n = 1
    return average/n 

def subsampling_avg(matrix, subs = 4):
    [h,w] = (matrix.shape)
    result = np.zeros((int(h/subs),int(w/subs)),dtype=np.uint16)
    n = 0
    m = 0
    for i in range(0,h-subs,subs):
        m = 0
        for j in range(0,w-subs,subs):
            intensity = nonZeroDownsampling(matrix[i:i+subs,j:j+subs])
            result[n,m] = intensity
            m+=1
        n+=1
    return result

I decided to use the first 4 frames of the fall-01 sequence on the URDF dataset. On the left we can see the raw frames with the measurement correction detailed on the URFD page. It’s clear that the image has a lot of missing values on the floor and the rest of the furniture. On the right side we have the Non-Zero subsampling result which has done a decent job at reducing an closing the gaps.

Exponential Moving Average — EMA filtering

This is a bit new to me, I have not used this function in the past. In the manufacturers article it is said that this filter is supposed to help with edge preservation. We scan the image on the x axis (row by row) and the apply the EMA function as it were a 1D function. Then we apply the same but in the y axis (column by column)

Here’s the code

def EMA1D(array, alpha, thresh):
    result = np.zeros(len(array))
    result[0] = array[0]
    last_value = 0
    for i in range(1,len(array)):
        if(abs(array[i]-array[i-1]) < thresh):
            result[i] = alpha * array[i] + (1-alpha) * last_value
        if(abs(array[i]-array[i-1]) > thresh):
            result[i] = array[i]
        last_value = result[i] 
    return result

def EMA2D(img, alpha, thresh):p
    img = np.asarray(img)
    [x,y] = np.asarray(img).shape
    result = np.zeros((x,y), dtype=np.uint16)
    for i in range(x):           
        result[i][0:y] = EMA1D(img[i][0:y], alpha, thresh)
    for j in range(y):
        result[:,j] = EMA1D(img[:,j], alpha, thresh) 
    return result

Now, it does seem a little bit off to me , it may be because I don’t know how to accurately choose the right value for the threshold. In another article I’ll check why this is in more detail, but for now I’ll move on.

Time filter

In this case it is quite simple, we use the values of previous frames to fill in the missing values. In our case we will use a maximum of 2 previous frames to fill our data

def fillMissingValues(img0,img1,img2):
    [h,w] = np.asarray(img0).shape
    result = np.zeros((h,w), dtype=np.uint16)
    for i in range(h):
        for j in range(w):
            if(img0[i][j] == 0):
                if(img1[i][j] == 0 and img2[i][j] > 0):
                    result[i][j] = img2[i][j]
                elif(img1[i][j] == 0 and img2[i][j] == 0):
                    result[i][j] =  img0[i][j]
                else:
                    result[i][j] = img1[i][j]
            else:
                result[i][j] =  img0[i][j]
    return result

Here our result will only be 2 images since frame three needs info from 2 and 3, and 4 needs info from 3 and 2

Next steps

My next job might be a two or even a three parter since I’m tempted to find new filters that are better suited for images generated with the x-box Kinect, in addition I will be tackling the RANSAC algorithm for ground segmentation, and this will require a bit of trigonometry.

References

[1]. Z. Li, F. Song, B. C. Clark, D. R. Grooms, and C. Liu, “A Wearable Device for Indoor Imminent Danger Detection and Avoidance With Region-Based Ground Segmentation,” IEEE Access, vol. 8, pp. 184808–184821, 2020, doi: https://doi.org/10.1109/access.2020.3028527.

‌