Exploring insect wing venation with persistent homology

Code & Notebook on GitHub

This post walks through a Python pipeline for analyzing insect wing venation using persistent homology, a tool from topological data analysis (TDA). By preprocessing wing images into 2D point clouds, we can detect loop structures in the venation using TDA. I demonstrate the method using wing images from a dragonfly, cicada, and grasshopper, and briefly discuss how their structural patterns are shown in persistence diagrams.

wing1 wing2 wing3

Wing photo source: Dragonfly, Cicada, Grasshopper

Procedure

Outline:

Use scikit-image to preprocess images
Use scikit-TDA (ripser.py and persim) for topological data analysis

1. Process images

Let’s use a forewing of dragonfly as the example. The image preprocessing includes the following steps:

Load image in grayscale
Denoise
Convert to binary image
Skeletonize
Make into point cloud

import numpy as np
import matplotlib.pyplot as plt
from skimage import io, color, filters, morphology
from skimage.restoration import denoise_bilateral
from skimage.filters import threshold_local
from skimage.morphology import skeletonize

1.1 Load image in grayscale

img = io.imread("wing_dragonfly.jpg", as_gray=True)

1.2 Denoise

The printed image might not look much different before and after denoising, but it does clean out some dots within the venation windows later in the binary image, which would likely make some difference in TDA results.

img_denoise = denoise_bilateral(img, sigma_color=0.05, sigma_spatial=15)

1.3 Convert to binary image

Instead of using a fixed global threshold (which may fail under uneven lighting), we apply local thresholding, which adapts based on the pixel’s neighborhood. The local threshold is computed as the weighted mean of surrounding pixels (set by block_size), minus a constant (offset).

local_thresh = threshold_local(img_denoise, block_size=35, offset=.001)
img_binary = img_denoise < local_thresh

1.4 Skeletonize

Skeletonization reduces each vein to a 1-pixel-wide path, preserving the structure while greatly reducing the number of points.

img_skeleton = skeletonize(img_binary)

1.5 Convert to point cloud

# Get coordinates of all foreground pixels (vein areas)
points = np.column_stack(np.nonzero(img_skeleton))  # shape: (N, 2)

1.6 Subsampling

Subsampling reduces computational cost in TDA, but there’s a trade-off: too few points may miss topological features, while too many increase runtime. Choose n_sample based on dataset size and analysis goals.

idx = np.random.choice(len(points), size=n_sample, replace=False)
points_sampled = points[idx]

2. Topological data analysis (TDA)

We use persistent homology to identify loop-like structures in the wing’s point cloud. Specifically, we apply a Vietoris–Rips filtration to extract H0 (connected components) and H1 (loops). For background, see intro video on TDA by Matthew Wright.

from ripser import ripser
from persim import plot_diagrams

# Rips filtration
result = ripser(points_sampled, maxdim=1)  # setting maxdim=1 would detect H0 and H1 features
diagrams = result["dgms"]

# Plot persistent diagram
plt.figure(figsize=(5, 5))
plot_diagrams(diagrams, show=True, lifetime=True, size=5)

The output persistent diagram shows each detected loops in orange dot (H1), with more “significant” features (those with longer lifespan across distance threshold) locating higher up in y axis, while the less significant or noise features lower down at the bottom. From the results, I suppose the highest two orange points represent the big windows at the top left of the wing.

Applying the workflow to other insect wings

Now I apply the same procedure to process two additional wing images.

Grasshopper (hindwing)

Image processing

Persistence digram

Cicade (forewing)

Image processing

Persistence digram

The 15 orange dots higher up in the persistence diagram likely correspond to the 15 distinct windows in the cicada forewing. The dots near the bottom may represent noise or small structures near the wing base.

Refleciton

Point sampling density significantly affects the persistence diagram, especially the Birth values of H1 features. Choosing the sampling level is a trade-off between computational efficiency and topological fidelity.
- In densely sampled images (e.g. cicada), true loop structures (wing windows) tend to appear as a vertical line of H1 points with low Birth values but varying Lifetime. Those with higher Birth values are likely to represent noise.
- In contrast, sparser sampling spreads H1 features more broadly across the birth axis (e.g. grasshopper), making it harder to distinguish true biological structures from noise.
In this particular case of analyzing wing venation with TDA, the actual venation information is probably mostly captured in the lifetime of H1 features (distribution along the y-axis), whereas the birth axis reflect point sampling density and information about how likely the signals are noise.