Decoding AI and Search Ecosystems.

My research is dedicated to reverse-engineering digital ecosystems, with a primary focus on the modeling of Google Seed Sites and analyzing how Large Language Models (LLMs) vectorize, interpret, and compare textual data.
A central theme of my recent work involves the reliability of semantic similarity metrics in machine learning. In my latest paper, I explore how ”cosine similarity” measurements are frequently distorted by a phenomenon known as ”Diagonal Rescaling.” This mathematical anomaly skews the perceived similarity between words and documents, creating a critical blind spot in automated systems. Read the full paper about Cosine Similarity under Diagonal Rescalings.
The Risks of Automated Optimization
This research highlights significant vulnerabilities in modern SEO and AI Optimization (AIO). Relying on large-scale, automated content generation and rewriting purely based on semantic similarity scores carries substantial risk. Without rigorous manual quality assurance or proper validation of the underlying transformer models, automated systems may report high optimization scores while actually producing textual errors, factual anomalies, or nonsensical output.
My objective is to map these algorithmic blind spots, ensuring that AI-driven content scaling and search optimization remain both mathematically sound and practically effective.
Topical Geometry – Mapping the Optimal Topical Geometries (OTG) in AI and Search
My research is grounded in the reality that modern search engines and Large Language Models (LLMs) rely on vector embeddings to process language —meaning semantics is fundamentally mathematics. Building on this, I have developed a model to map Optimal Topical Geometries (OTG): the ideal mathematical structure a website needs to dominate its niche.
Here are the core principles of my model:
1. Vector Spaces and the Site Centroid
Every website possesses a mathematical center of gravity—a Centroid (Csite)—within a high-dimensional vector space. By calculating this, we can precisely position a website relative to the core topics of its specific industry.
2. The Core Variables: Focus Score (FS) and Radius (R)
To evaluate a website’s ”geometry,” my model relies on two primary variables:
- Focus Score (FS): Density. A measure of how closely a website’s pages orbit their own centroid, typically calculated using cosine similarity.
- Radius (R): Spread. A metric defining how far the site’s content stretches from its core. I typically use R90 (the radius containing 90% of the site’s most important content mass) to filter out structural noise.
3. Archetypes and the ”Goldilocks Zone”
There is no single, universal FS/R ratio that guarantees success. Instead, winning websites cluster into specific ”Safe Zones” depending on their archetype. A broad financial news portal and a hyper-niche credit card comparator can both dominate search, but they possess entirely different geometries. The model identifies these zones per market and vertical to establish an accurate mathematical blueprint.
4. Strategic and Practical Applications
By reverse-engineering the optimal geometry—the blueprint—from the top 10% of performers in a specific vertical, we can apply this data to generate direct business value.
Identifying Content Gaps:
We can pinpoint invisible voids within the vector space—areas with high demand but lacking sufficient geometric coverage from top competitors.
Content Strategy & Mapping:
We can mathematically prove if a site is too scattered (high radius) or too narrow. This allows us to build precise internal link clusters that steer the client’s site directly into the winners’ ”Safe Zone.”
Topical Link Trust (Spam Filtering):
Backlinks are evaluated based on the donor’s geometry. A link from a site with a poor FS/R ratio (e.g., a sprawling, off-topic blog) is filtered out, maximizing topical relevance and mitigating algorithmic risk.