GitHub:
https://github.com/shion92/automated-detection-of-nematode-Eggs-
Why Detecting Nematode Eggs Matters
In many laboratories, identifying nematode eggs still relies heavily on manual microscopy.
A technician examines the sample under a microscope and visually searches for the eggs. This process requires training, concentration, and a lot of time.
Recent advances in deep learning with models like YOLO, Faster R-CNN, and Mask R-CNN have already been used in medical imaging, agricultural monitoring, and microscopic analysis. After a few hours researching, I’ve convinced that I can have fully automatically process for detecting nematode eggs in microscopic images.
The Problem: Microscopic Images Are Often Overexposed, Underexposed, and Filled with Debris
Detecting nematode eggs is not the same as detecting everyday objects like cars or people.
The key challenges include:
- Noisy background — microscopic samples often contain debris, particles, and artefacts that can easily be mistaken for eggs.
- Shape similarity — eggs can resemble air bubbles, organic debris, or other microscopic particles.
- Lighting variations — microscopy images often vary in brightness and contrast, and can be easily overexposed or underexposed depending on the imaging conditions.
- Limited training data — there are relatively few publicly available datasets of annotated nematode egg images.
- Lack of specialised pretrained models — most computer vision models are trained on general datasets (such as COCO or ImageNet), which contain everyday objects rather than microscopic biological structures.
Example microscopic sample:
If you look closely, you will see very unobvious oval shapes at the 5 o’clock direction in the middle of the bottom half of the image — that is the egg!
Step 1 — Building the Dataset
Before training a model, the dataset needs to be labelled.
Each egg in the image must be manually annotated with a bounding box. I’ve been lucky enough to have 10 images at the time and it didn’t take me long to label them myself (although the first time I really thought those water bubbles were the eggs … quite embarrassed for my presentation of primary findings …)
Annotations were stored in Pascal VOC XML format, which commonly used for the model training, it includes:
- object class
- bounding box coordinates
- image dimensions
Step 2 — Choosing a Detection Model
For this project, I experimented with several detection approaches:
- OpenCV-based detection (baseline — which obviously didn’t do very well)
- Faster R-CNN (one of the more established models at the time. The name sounds cool and there are lots of GitHub repos and examples to reference)
- YOLO object detection models (one of the most popular detection frameworks. There’s a lot of good community support around it. I had access to YOLOv8 at the time and experimented with a few different versions — I think I ended up mainly using v5 and v8 for comparison)
Step 3 — Training the Model
The dataset was organised into the typical machine learning structure:
dataset/
train/
val/
test/
During training, the model learns to recognise patterns such as:
- egg shape
- edges
- texture
- size relative to background particles
Example training output:
Epoch 1: Loss = 0.83
Epoch 5: Loss = 0.36
Epoch 10: Loss = 0.14
As training progresses, the model gradually improves its ability to identify eggs.
Step 4 — Running Detection
Once trained, the model predicts bounding boxes for eggs.
Example result:

- Green boxes: model predictions
- Blue boxes: ground truth annotations
Evaluating the Model
To measure detection performance, several standard computer vision metrics are used.
Intersection over Union (IoU)
Measures how closely predicted boxes match the true boxes.
IoU = area of overlap / area of union
Precision
How many detected eggs are correct.
Precision = True Positives / (True Positives + False Positives)
Recall
How many real eggs the model successfully finds.
Recall = True Positives / (True Positives + False Negatives)
Example results:
Precision: 0.91
Recall: 0.88
F1 Score: 0.89
These numbers suggest the model can detect most eggs while maintaining a relatively low false positive rate.
The Hardest Part: Background Noise
One surprising challenge in this project was the amount of visual noise in microscopic images.
Things that can confuse the model include:
- dust
- debris
- air bubbles
- irregular particles
Like the one below – can you even spot the egg if the annotation wasn’t there?

In those cases, the increased sample size and application of strong augmentation made a significance difference.
Where only part of the nematode egg is visible (we call this image occlusion), not all models cope the same. YOLO with strong augmentation or larger pre-trained models (v8m) handles these cases better, whereas Faster R-CNN can struggle.

What I Learned From This Project
Working on this project reinforced several lessons about real-world machine learning.
1. Data preparation matters more than the model
Even the smartest models can get confused if the dataset is annotated inaccurately, especially when you think water bubbles are the eggs. Honestly, can’t trust a non-professional’s judgment!
2. Small example size is difficult
In theory cross-validation would be nice, but with so few examples it doesn’t work particularly well. I basically had two options: either convince my client to collect a lot more labelled examples (which is easier said than done), or lean heavily on data augmentation to make the most out of the samples we already had.
3. Evaluation metrics are essential
Looking only at only the predictions can be misleading, a set of metrics on the screen will be more convincing (for your client).
Final Thoughts
🙂
Project Repository
If you’d like to explore the code or experiment with the model yourself:
GitHub:
https://github.com/shion92/automated-detection-of-nematode-Eggs-
I also had a collaboration repo here with a fellow student who built the interface and had to deal with my coding mess.
https://github.com/jwqiu/Automated-Nematode-Egg-Detection
Live Site (Live Site (you can try it yourself, although I’m not sure if it’s still being maintained).
jwqiu.github.io/Automated-Nematod
If you are working on microscopy, computer vision, or biological image analysis, I’d love to hear about your experience as well. Feel free to reach out 🙂