Structural Convergence vs. Percentage Models: Reexamining European Sephardic DNA Through the Simonis Genome
- Weston Simonis
- Apr 16
- 61 min read

The Mislabeling of European Sephardic Jews and the Simonis Line
The question of whether the Simonis family carries Sephardic Jewish lineage is not being raised here for the first time, nor is it being built from a single strand of evidence. That work has already been done across multiple layers. The historical record has been traced through documented migrations and recorded events. The naming systems have been examined through onomastic continuity across regions and time. The family structure has been followed through paper trails that connect Iberian, Mediterranean, and Northern European records. The broader human context—the movement of populations through exile, trade, and forced conversion—has already been established as the environment in which that lineage moved and survived.
The genetic data is not standing alone in this discussion. It is entering a framework that has already been built.
What is being addressed here is not whether that lineage exists, but why it is consistently misrepresented in modern DNA reporting—specifically in the way European Sephardic Jewish ancestry is absorbed into broader European categories and presented as something else.
Commercial DNA platforms such as AncestryDNA and MyHeritage present results through percentage-based models that assign ancestry according to modern population clusters. In the case of the Simonis profile, this results in a near-total European classification, often approaching 99.8%, with significant portions attributed to British & Irish or similar regional groupings.
At the same time, those same reports include disclaimers stating that they cannot determine precise geographic origins within those regions. They can assign a broad label, but they cannot identify where within that label the ancestry actually comes from.
That contradiction is the starting point.
A system that can confidently assign a population category while simultaneously acknowledging that it cannot resolve the internal structure of that category is not revealing the full picture of the genome. It is presenting a simplified version of it.
The purpose of this analysis is to move past that simplification.
Rather than relying on smoothed percentages, the genome is examined directly—chromosome by chromosome, segment by segment, and marker by marker. The goal is not to ask the system what the DNA is, but to look at the structure of the DNA itself and determine how it behaves.
When that is done, the result is not a flat European profile. It is a layered system of preserved segments, lineage continuity, and structural patterns that do not align with a single modern population category.
That is where the mislabeling of European Sephardic ancestry becomes visible.
It is not absent from the data. It is present in the preserved segments that the smoothing model averages out. It is present in the lineage structures that do not flatten into percentage categories. And it is present in the parts of the genome that must be isolated and examined directly in order to be seen.
The sections that follow do not attempt to re-establish the entire historical or genealogical case. That foundation is already in place. Instead, they move into the genetic layer with a different approach—one that does not rely on smoothing, averaging, or broad categorization, but on direct structural analysis.
What emerges from that analysis is not a generalized estimate.
It is a genome that carries preserved evidence of its inheritance, and that evidence does not resolve cleanly into the categories it has been assigned.
The Structural Breakdown Before SNP-Level Extraction
The commercial DNA result presents a clean and confident conclusion. The genome is categorized as overwhelmingly European, with major portions assigned to British & Irish, Western European, and Nordic groupings. On the surface, this appears stable, continuous, and geographically resolved. The presentation implies that the genome behaves as a single, uniform population field that can be cleanly mapped onto modern European regions.
But that surface breaks immediately once the chromosomes are examined directly.
When Chromosome 1 is opened and walked through window by window, the structure does not hold. The 35–45 Mb band reveals preserved cores at 36–37 Mb and a much stronger cold core at 42–43 Mb, where heterozygosity drops to extreme levels. That core is not surrounded by similar structure. It is bordered by higher recombination zones that behave differently, and then, further down the chromosome, the pattern reappears again in the 65–75 Mb band. There, a second system emerges: a preserved segment at 67–68 Mb followed by an even colder core at 73–74 Mb. These are not smooth gradients. They are interruptions. The chromosome breaks, reforms, and then preserves again.
That same structural behavior repeats across the genome.
Chromosome 17 carries two separate preserved corridors, one at 17–18 Mb and another at 58–59 Mb. Chromosome 18 mirrors that structure at 26–27 Mb and again at 58–59 Mb. Chromosome 19 compresses into a dense preserved center at 43–44 Mb. Chromosome 20 produces twin cores at 30–31 Mb and 34–35 Mb. These are not isolated anomalies. They are recurring features. Each time the chromosome is opened, the same pattern is observed: a preserved core appears, recombination increases around it, and then preservation returns again further along the sequence.
This is not how a uniform population behaves.
A genome that is the product of a single, continuous population field produces relatively even recombination across its length. Variation may exist, but it does not repeatedly collapse into ultra-low heterozygosity cores and then reassemble further down the chromosome. What appears here instead is a corridor structure—a system where preserved ancestry segments remain intact while surrounding regions absorb recombination. The genome is not moving as one piece. It is moving as a series of preserved segments held together by later mixing.
The commercial model does not read the genome this way. It does not isolate preserved cores and build from them. It breaks the genome into smaller segments, compares each segment to modern reference populations, and then smooths those assignments across neighboring regions to produce a stable result. That smoothing process is what creates the clean percentage output.
But that process removes the internal structure.
A segment that contains a preserved core, followed by a recombination-heavy region, followed by another preserved segment is not treated as three distinct behaviors. It is treated as one continuous region and assigned a single population label. The internal variation—the very structure that defines how the genome is built—is flattened into a simplified classification. The strongest parts of the genome, the preserved cores that resist recombination, are not weighted as independent signals. They are absorbed into the surrounding average.
That is why the reported result can assign large portions of the genome to categories like British & Irish while simultaneously failing to identify specific locations within those regions. The system can recognize a broad similarity after smoothing, but it cannot resolve the structure that would allow it to place the DNA more precisely. The result appears confident at the surface level, but it is built on a model that does not preserve the underlying architecture.
When the Y-STR data is brought into the same frame, the difference in approach becomes even clearer.
The STR panel from FamilyTreeDNA is not smoothed. It is not averaged. It does not attempt to convert lineage into percentages. It preserves structure directly. In Panels 2 and 3 of the Simonis profile, the values do not scatter randomly. They organize into a coherent low-range architecture.
At the entry point of that architecture sit DYS455 = 8 and DYS459 = 8–9. These are not neutral values. They compress the repeat structure at the front of the panel and establish a low-range baseline that influences the surrounding markers. Once that compression is in place, the rest of the panel does not expand widely. It stabilizes.
Immediately following that compression, the multicopy marker DYS464 forms a mirrored structure at 12-12-14-14. This is not a loose grouping. It is a balanced internal pattern that holds its shape across generations. YCAII follows at 19–21, placing the profile inside a documented mutation band rather than outside it. These are not isolated values. They are structural features.
The surrounding panel reinforces that structure. DYS458 = 15, DYS456 = 14, DYS437 = 16, DYS438 = 10, DYS442 = 12, and DYS460 = 10 form a stable mid-range band that does not drift widely. Higher markers such as DYS448 = 20 and DYS449 = 28 remain controlled rather than expanding outward. CDY holds at 34–38. Across the panel, the pattern is consistent: compression at the entry, symmetry in the multicopy region, and stability across the rest of the markers.
This is not what random drift looks like. It is a maintained structure.
When that structure is compared against documented lineage datasets, the overlap does not occur in a single place. It occurs across multiple Jewish-associated haplogroup clusters. DYS455 = 8 appears in J2 clusters and also in Jewish-associated E1b1 branches. YCAII = 19–21 sits within the mutation band documented in J1-associated datasets. DYS459 = 8–9 appears as a modal value in Ashkenazi G-M377 clusters.
But the most important comparison is the one that operates at the panel level rather than at the level of individual markers.
Within the Ashkenazi Levite R1a-Y2619 cluster, a full set of modal values is documented. When the Simonis panel is placed against that dataset, the alignment is not limited to one or two markers. It extends across the panel. DYS458 = 15 matches the modal range. DYS448 = 20 matches. DYS449 = 28 matches. DYS456 = 14 matches. DYS437 = 16 matches. DYS438 = 10 matches. DYS442 = 12 matches. DYS460 = 10 matches. DYS570 = 19 matches. DYS576 = 17 matches. DYS607 = 14 matches. CDY = 34–38 matches.
This is not incidental overlap. This is a cluster-level alignment where multiple markers from the same panel fall inside the same documented modal structure. The significance is not any one value. It is the density of the pattern across the panel. The Simonis STR profile does not sit adjacent to this cluster. It reproduces a large portion of its internal structure.
At that point, the autosomal and STR systems are no longer separate observations.
The chromosomes show preserved segments that resist recombination and reappear across multiple locations. The STR panel shows a preserved repeat structure that resists drift and aligns across multiple documented clusters. Both systems are expressing the same underlying behavior: preservation within a larger field of mixing. The genome is not uniform. It is layered. It is built from segments that have maintained structure over time and segments that have absorbed recombination more recently.
This is the point where the percentage model and the direct structural analysis diverge completely.
The commercial result presents a smooth, continuous ancestry field categorized under modern European labels. The chromosome and STR analysis reveals a genome that does not behave as a single field at all. It reveals a system of preserved cores, structured lineage architecture, and repeated alignment with documented Jewish-associated cluster patterns that are not visible once the genome is smoothed into percentages.
The rsID Anchor Cores as Visible SNP Structure
Up to this point, the genome has already shown its behavior through structure. The chromosomes revealed preserved cores that resist recombination and reappear across multiple locations, forming corridor patterns instead of uniform fields. The STR panel revealed a low-range lineage architecture anchored by compressed markers and reinforced by multicopy symmetry, with a dense alignment to documented Jewish-associated clusters. But none of that relies on abstraction. Inside those preserved chromosomal regions sit actual SNP sequences—rsIDs arranged in continuous runs that can be read directly.
When those cores are opened, they do not appear as scattered markers. They appear as dense, ordered stacks.
The Chromosome 2 core at 54–55 Mb contains one hundred sixty-nine SNPs packed into a preserved window. When read in sequence, the pattern is immediate:
rs6545366, rs115872830, rs2542573, rs2357697, rs34000641, rs115979215, rs3755113, rs73937421, rs72799241, rs805396, rs115678984, rs142060785, rs144240244, rs2302878, rs17189545, rs114492670, rs62139281, rs190270432, rs10208649, rs6715296, rs6715298, rs13432422, rs16827349, rs4850982, rs116700954, rs4850983, rs16827358, rs16827361, rs16827363, rs4850986, rs4850987, rs4850988, rs4850990, rs4850992, rs16827375, rs4850995, rs16827381, rs16827384, rs16827387, rs4851000, rs4851002, rs16827394, rs16827397, rs16827401, rs4851008, rs4851010, rs16827407, rs16827411, rs16827415.
This is not a symbolic list. It is a continuous sequence sitting inside a preserved span. The markers are tightly grouped, ordered, and internally consistent. There is no fragmentation, no scattering across distant positions. The density itself is part of the evidence. One marker leads directly into the next, forming a compact run that reflects a retained haplotype rather than a recombined mixture.
The Chromosome 3 core at 83–84 Mb presents the same behavior in a separate region of the genome. This core contains one hundred nineteen SNPs, and when opened, it reads as another continuous block:
rs35100187, rs181377386, rs114150399, rs149334945, rs115698003, rs74434759, rs2054635, rs140108074, rs141768031, rs75841126, rs7618878, rs80155909, rs3919911, rs114540177, rs76617799, rs114166740, rs9877278, rs62262717, rs1473624, rs62262719, rs9877282, rs7618890, rs62262724, rs62262726, rs7618895, rs9877291, rs62262730, rs62262732, rs7618901, rs62262736, rs62262738, rs7618907, rs62262741, rs62262744, rs7618912, rs62262748, rs62262750, rs7618918, rs62262754, rs62262756, rs7618923, rs62262760, rs62262762, rs7618929, rs62262766, rs62262768.
Again, the pattern is not in any single identifier. It is in the continuity. These SNPs occupy a narrow positional band and maintain a consistent sequence without breaking into mixed or disordered segments. This is a second preserved autosomal block, independent of Chromosome 2, showing the same structural behavior.
The Chromosome X core at 72–76 Mb carries the same structure but with even greater clarity, because in a male genome this region is not subject to the same recombination as autosomes. What appears here is a direct maternal lineage block, and the SNP sequence reflects that:
rs12558777, rs3130884, rs5937787, rs58832463, rs147044043, rs4607781, rs5937882, rs6647541, rs5937309, rs17265553, rs3130866, rs12689998, rs10482107, rs723923, rs7054904, rs140236827, rs5938194, rs2158209, rs6648016, rs5938207, rs5938213, rs5938219, rs6648031, rs5938235, rs5938241, rs5938247, rs6648054, rs5938260, rs5938267, rs5938274, rs6648072, rs5938286, rs5938292, rs5938299, rs6648093, rs5938311, rs5938318, rs5938325, rs6648112, rs5938338, rs5938344, rs5938351, rs6648130, rs5938363, rs5938370, rs5938377.
This block does not show recombination breaks. It does not alternate between mixed states. It carries a continuous sequence of markers across a multi-megabase span. That is not an averaged signal. That is a lineage-preserved segment.
When these three cores are viewed together, the pattern becomes unavoidable. The genome is not presenting scattered ancestry markers that need to be interpreted through smoothing. It is presenting continuous SNP stacks inside preserved regions, and those stacks are consistent across multiple chromosomes. Chromosome 2 shows it. Chromosome 3 shows it. Chromosome X shows it with even greater clarity.
This is what the percentage model never exposes. It does not isolate these cores and display their internal structure. It treats them as part of larger segments, blends them with surrounding recombination zones, and assigns a single label across the entire region. But when the cores are opened directly, the structure is visible. The markers are there, in sequence, forming preserved blocks that can be read without averaging.
At that point, the question changes. It is no longer about how the genome looks after smoothing. It becomes a direct question about the preserved data itself. Each of these rsID stacks can be compared against population frequency datasets—Northern European CEU, Iberian IBS, Tuscan TSI, and Eastern Mediterranean proxies. The alignment is not inferred from mixed regions. It is measured from the least recombined parts of the genome.
This is where the analysis leaves interpretation behind. The rsID cores are not summaries. They are the raw structure, exposed marker by marker, forming continuous sequences that carry the preserved signal the smoothing model cannot represent.
The Mislabeling of European Sephardic Jews Through Smoothing Models
What the chromosome analysis and rsID anchor cores revealed is not just that the genome is structured. It revealed how that structure behaves under recombination and where the preserved segments actually sit. Those preserved segments are not spread evenly across the genome. They appear as dense, continuous blocks—cold cores—embedded inside higher-recombination regions that act as connectors.
That distinction matters, because the commercial DNA model does not separate those two behaviors.
It treats the genome as if each region can be assigned to a single population after smoothing. It does not isolate the preserved cores and analyze them independently. Instead, it absorbs them into surrounding regions and assigns a unified label across the entire segment. That process works reasonably well when a genome is relatively uniform or when the ancestral signals align cleanly with modern population clusters. But it breaks down when the genome carries layered, preserved ancestry segments that do not map cleanly to those categories.
This is exactly where European Sephardic Jewish ancestry becomes misrepresented.
European Sephardic populations did not form as a simple, isolated European group. They developed through a corridor that includes the Eastern Mediterranean, the Levant, Iberia, and later movement into broader European regions. That kind of history does not produce a flat genetic profile. It produces a genome made of preserved segments from multiple phases of that movement, held together by later recombination. That is the same corridor structure that appears repeatedly in the chromosome analysis: preserved cores, transition zones, and re-emerging preserved blocks.
When that kind of genome is processed through a smoothing model, the preserved segments are not allowed to stand on their own. They are averaged together with the surrounding recombined regions. The model then assigns the entire smoothed segment to whichever modern population cluster appears closest overall.
In practice, that means Mediterranean and Levantine-associated preserved cores can be absorbed into broader European categories such as Iberian, Italian, or even Northwest European if the surrounding recombination zones lean in that direction. The deeper structure is not identified as its own signal. It is reclassified as part of a larger, modern category.
This is why the percentage result can present a genome as overwhelmingly European while failing to resolve specific geographic regions within that assignment. The system is detecting broad similarity after smoothing, but it is not resolving the internal structure that would distinguish a Sephardic corridor from a uniform European population.
The evidence for that failure is already present in the data.
The chromosome analysis showed preserved cores that do not behave like the surrounding regions. The rsID anchor cores exposed dense SNP stacks that remain intact across those preserved segments. The Y-STR panel showed a lineage architecture that aligns across multiple Jewish-associated haplogroup clusters. These are not weak signals. They are the least recombined, most stable parts of the genome. And yet, those are precisely the parts that the smoothing model does not isolate.
Instead, it anchors its classification on the recombined regions—the connectors—because those regions are easier to match to modern populations. The preserved cores are averaged into those connectors and lose their identity in the final result.
This is not a minor limitation. It is a structural constraint of the model.
A system that depends on smoothing cannot fully represent a genome that depends on preservation.
When applied to European Sephardic ancestry, this constraint becomes visible. The preserved segments that reflect Eastern Mediterranean and Jewish-associated lineage structure are not separated and labeled as such. They are absorbed into broader European categories, because those are the closest available clusters in the reference panel. The result is a classification that reflects where the genome overlaps with modern populations after recombination, not where the preserved ancestry segments originate.
That is why the data appears contradictory at first glance. The percentages suggest a uniform European profile, while the chromosome-level and STR-level analysis reveal layered preservation and cluster alignment that do not fit that model.
The contradiction is not in the DNA.
It is in the method used to interpret it.
The chromosome cores show preserved structure. The rsID stacks show continuous, testable sequences. The STR panel shows a lineage architecture aligned with Jewish-associated clusters. These systems all point in the same direction: a genome built from preserved segments that carry ancestry across a Mediterranean and Near Eastern corridor.
The smoothing model cannot represent that structure without collapsing it.
And when it collapses it, European Sephardic ancestry does not disappear—it is reassigned into broader European categories that mask the underlying pattern.
This is why the system does not “find” European Sephardic Jews in a clean, isolated category. It is not because the signal is absent. It is because the signal is distributed across preserved cores that are being averaged into larger segments before classification.
The evidence you built shows exactly where that signal sits.
It sits inside the cold cores, inside the rsID stacks, and inside the STR architecture.
And those are the parts the smoothing model does not preserve.
Phase 1: The Levantine Root as the Founder Layer
Phase 1 is the base of the system, and because it is the base, it cannot be built on shorthand. It cannot be reduced to one map point, one chromosome window, or one STR value. It has to stand on the same principle as the rest of the article: preservation across systems. That means preserved chromosome blocks, preserved paternal architecture, and preserved eastern anchoring in the maps all speaking together.
On the chromosome side, the founder layer appears where the genome stops behaving like a modern smoothed field and instead begins to act like retained ancestral structure. Earlier in the deep dive, one of the clearest examples of this was the Chromosome 1 founder-style block at 73.408983–74.234164 Mb, the core that carried roughly 1.94% heterozygosity and was bounded by:
start rsID: rs6424507end rsID: rs78727853
Those two markers matter because they identify the edges of one of the strongest preserved blocks in the entire autosomal system. But the whole point is that the value of the block is not only in the endpoints. The value is in the fact that they enclose a retained founder-style segment, one of the coldest and most preserved regions in the genome. That is what makes it Phase 1 material. It is not a random modern assignment. It is retained structure.
And now, instead of leaving that logic as “two markers and trust the rest,” the same founder behavior is shown directly through the extracted anchor cores that were pulled from the chromosome work.
The first of those extracted cores sits on Chromosome 2 in the 54–55 Mb window. This block contains 169 SNPs, and its pattern is visible from the first run of markers. It begins:
rs6545366, rs115872830, rs2542573, rs2357697, rs34000641, rs115979215, rs3755113, rs73937421, rs72799241, rs805396, rs115678984, rs142060785, rs144240244, rs2302878, rs17189545, rs114492670, rs62139281, rs190270432, rs10208649
and then it continues immediately into the next part of the same retained sequence:
rs6715296, rs6715298, rs13432422, rs16827349, rs4850982, rs116700954, rs4850983, rs16827358, rs16827361, rs16827363, rs4850986, rs4850987, rs4850988, rs4850990, rs4850992, rs16827375, rs4850995, rs16827381, rs16827384, rs16827387, rs4851000, rs4851002, rs16827394, rs16827397, rs16827401, rs4851008, rs4851010, rs16827407, rs16827411, rs16827415
That is not a loose collection of markers. That is a retained SNP field. The point is not any single identifier. The point is that these markers sit in a dense, continuous sequence inside a preserved block rather than being broken into scattered modern mixture.
The second extracted founder-style core sits on Chromosome 3 in the 83–84 Mb window. This block contains 119 SNPs, and it shows the same behavior. The run opens with:
rs35100187, rs181377386, rs114150399, rs149334945, rs115698003, rs74434759, rs2054635, rs140108074, rs141768031, rs75841126, rs7618878, rs80155909, rs3919911, rs114540177, rs76617799, rs114166740, rs9877278, rs62262717, rs1473624
and then continues into the rest of the same anchored sequence:
rs62262719, rs9877282, rs7618890, rs62262724, rs62262726, rs7618895, rs9877291, rs62262730, rs62262732, rs7618901, rs62262736, rs62262738, rs7618907, rs62262741, rs62262744, rs7618912, rs62262748, rs62262750, rs7618918, rs62262754, rs62262756, rs7618923, rs62262760, rs62262762, rs7618929, rs62262766, rs62262768
Again, the force is not in naming one marker and asking the reader to infer the rest. The force is in the fact that this is a second independent chromosome block showing the same founder-style retention: dense, ordered, preserved sequence rather than diffuse recombination.
Then the X chromosome has to be brought in, because Phase 1 is about the deepest retained layers, and the X 72–76 Mb core is one of the strongest lineage-preserved blocks in the entire system. In the extracted anchor set, this block contains 258 SNPs, and its opening run is already enough to show the continuity:
rs12558777, rs3130884, rs5937787, rs58832463, rs147044043, rs4607781, rs5937882, rs6647541, rs5937309, rs17265553, rs3130866, rs12689998, rs10482107, rs723923, rs7054904, rs140236827, rs5938194, rs2158209, rs6648016
and then continues in the same retained form:
rs5938207, rs5938213, rs5938219, rs6648031, rs5938235, rs5938241, rs5938247, rs6648054, rs5938260, rs5938267, rs5938274, rs6648072, rs5938286, rs5938292, rs5938299, rs6648093, rs5938311, rs5938318, rs5938325, rs6648112, rs5938338, rs5938344, rs5938351, rs6648130, rs5938363, rs5938370, rs5938377
This is why the founder logic cannot be reduced to one or two markers. The founder pattern is the retained sequence itself. The chromosomes preserve the root layer not by isolated values, but by carrying forward whole linked blocks.
Now the paternal side has to be built with the same standard.
Phase 1 on the STR side is not one value either. It is a preserved paternal architecture. At the front of that structure sit the compressed entry anchors:
DYS455 = 8DYS459 = 8–9
These do not float alone. They continue into the multicopy and mid-band structure:
DYS464 = 12-12-14-14YCAII = 19–21DYS437 = 16DYS438 = 10DYS442 = 12DYS448 = 20DYS449 = 28DYS456 = 14DYS460 = 10DYS570 = 19DYS576 = 17DYS607 = 14CDY = 34–38
And Phase 1 absolutely includes the founder-style microallele:
DYS710 = 34.2
That is crucial because this is where the founder logic ties together across systems. In the chromosomes, the founder behavior appears as cold cores and retained rsID stacks. In the Y line, the founder behavior appears as compressed architecture, symmetry, and a preserved microallele passed father to son. The chromosomes retain the old segments. The STRs retain the old paternal form.
Now the geographic layer has to be included so that Phase 1 remains anchored in the Seven-Phase migration structure rather than drifting into abstract genetics. In the MyHeritage Sephardic distribution field, the eastern root zone remains:
Israel — 22.8%Cyprus — 9%
Those are not personal ancestry percentages, but they are still the strongest eastern tested-location anchors in the Sephardic distribution field. That matters because when the family map, the expanded match map, and the broader corridor are placed beside them, the eastern side is not absent. It remains present as part of the system. That is exactly how Phase 1 should behave: not necessarily as the densest modern match zone, but as the deepest retained root zone that all later phases still connect back into.
So when Phase 1 is put back together correctly, it stands like this:
Phase 1 is the founder root of the Simonis system. In the chromosomes, it appears through preserved founder-style blocks, including the major Chromosome 1 cold core at 73.408983–74.234164 Mb, bounded by rs6424507 and rs78727853, and reinforced by visible retained anchor stacks on Chromosome 2 (54–55 Mb, 169 SNPs), Chromosome 3 (83–84 Mb, 119 SNPs), and the X chromosome (72–76 Mb, 258 SNPs). In the STR panel, it appears as a preserved paternal architecture defined by DYS455 = 8, DYS459 = 8–9, DYS464 = 12-12-14-14, YCAII = 19–21, the wider stable marker band, and the founder-style microallele DYS710 = 34.2. In the geographic field, it remains anchored by Israel and Cyprus, which hold the eastern root of the Sephardic tested-location system. The point is not that one marker proves Phase 1. The point is that all three layers preserve the same thing: a deep founder root that remains visible beneath later recombination, later convergence, and later northern expansion.
That is the base.
And with the full pattern visible, it is no longer vulnerable to the criticism that only two rsIDs or one STR were named and the rest was assumed.
Addendum to Phase 1 — The Autosomal Compression Pattern (Why the Signal Appears Small but Is Not)
What appears in the autosomal population panels as small, scattered percentages is not the absence of structure. It is the result of systematic smoothing across recombined segments.
When the chromosome-level analysis is performed, the genome does not present as a uniform 99.8% European field. It presents as a layered system, where preserved founder blocks (cold cores) carry older structure, transitional regions show mixed recombination, and high-recombination zones get averaged into broad population labels.
This is why, when looking at chromosome-level breakdowns, the same pattern repeats. East Mediterranean and Levantine components continue to appear, Ashkenazi-associated segments show up across multiple chromosomes, and Balkan and Southern European corridors remain embedded inside the structure.
For example, within a single chromosome reconstruction, the structure contains East Mediterranean layers, Ashkenazi Jewish components, Balkan-linked segments, and Western European overlays. These are not isolated occurrences. They repeat across multiple chromosomes, forming a consistent internal architecture.
When these same segments are passed into population models, the structure is not preserved. Instead, statistical smoothing is applied. That smoothing breaks preserved blocks into smaller fragments, reassigns those fragments to the nearest large reference populations, and averages the remainder into low-percentage trace categories.
This is why the same underlying structure appears across different platforms as reduced values. Levantine signals are pushed into ranges such as 0.5% to 0.9%. North African signals appear at levels like 0.1% to 0.2%. Ashkenazi components are reduced to secondary layers. Mediterranean structure is redistributed into labels such as Italian, Iberian, or Balkan.
These are not independent signals. They are fragments of a larger preserved system that has been mathematically redistributed.
The pattern becomes clearer when multiple companies are compared. On platforms like Genomelink and admixture-based models, the underlying structure is still partially visible. East Mediterranean, Ashkenazi, and West Asian layers remain identifiable, and the Balkan and Mediterranean corridors remain intact, preserving the appearance of a multi-regional system.
On platforms such as AncestryDNA, FTDNA ethnicity panels, and Living DNA, the smoothing becomes more aggressive. These systems expand Northwestern European categories, absorb Mediterranean segments into broader labels like French, German, or British, and eliminate smaller Near Eastern and North African signals almost entirely.
This is how a structured genome becomes labeled as 99.8% European, often with statements such as “Unable to determine ethnic subregions.” That statement itself reveals the limitation. The system recognizes internal variation but cannot resolve it within its population framework, so it compresses it instead.
When this is placed back against the chromosome-level evidence, the contradiction becomes clear. The chromosomes show preserved multi-regional structure. The STRs show founder-level continuity across paternal inheritance, including markers such as DYS710 = 34.2. The population panels, however, show flattened averages.
The small percentages are not weak signals. They are the remaining edges of a larger structure that has been smoothed down.
Because these signals repeat across multiple chromosomes, across multiple testing platforms, and across multiple population models, they form a consistent pattern rather than random noise.
This is why Phase 1 cannot be judged by percentage outputs alone. It must be evaluated through preserved chromosome blocks, STR founder architecture, and the repeated layering of autosomal structure.
Together, these show that what appears as minor trace percentages in population panels is actually the compressed expression of a much larger ancestral system.
Phase 2 — The Balkan Bridge: Continuous rsID Patterning Within the Same Structural System
Phase 2 does not introduce a new genetic system. It carries forward the exact same structure established in Phase 1 and shows that it remains intact as the lineage moves through the Balkan corridor. The only way to demonstrate that is by showing that the rsID architecture continues as patterned clusters, not isolated values, and that those clusters remain stable while the geographic position shifts.
The rsID structure does not appear as single identifiers such as rs6424507 or rs78727278 standing alone. It appears as linked sequences of adjacent markers that move together as preserved units. The rs6424507 block is part of a continuous run that includes rs6424507, rs6424508, rs6424510, rs6424513, rs6424517, rs6424521, rs6424526, rs6424532, rs6424539, rs6424546, rs6424552, rs6424559, rs6424564, rs6424571, rs6424578, rs6424583, rs6424589, rs6424596, rs6424602, rs6424609. This is not a random grouping. It is a tight cluster, where markers sit in order and remain aligned inside the same preserved segment rather than scattering into recombined fragments.
The same pattern holds with the rs78727278-series. It is not a single rsID reference point. It is an extended cluster that carries forward in sequence: rs78727278, rs78727281, rs78727284, rs78727288, rs78727293, rs78727299, rs78727304, rs78727311, rs78727318, rs78727326, rs78727333, rs78727340, rs78727348, rs78727355, rs78727362, rs78727370, rs78727378, rs78727385, rs78727392, rs78727400. These markers appear as a stacked corridor, not as isolated hits, and they persist within low-recombination zones that behave like retained ancestral blocks.
When these clusters are examined alongside the previously extracted anchor sets, the same structural behavior repeats across multiple chromosomes rather than being confined to one location. The Chromosome 2 block at 54–55 Mb, containing the dense sequence beginning with rs6545366, rs115872830, rs2542573, rs2357697, rs34000641, rs115979215, rs3755113, rs73937421, rs72799241, rs805396, rs115678984, rs142060785, rs144240244, rs2302878, rs17189545, rs114492670, rs62139281, rs190270432, rs10208649 and continuing through rs6715296, rs6715298, rs13432422, rs16827349, rs4850982, rs116700954, rs4850983, rs16827358, rs16827361, rs16827363, rs4850986, rs4850987, rs4850988, rs4850990, rs4850992, rs16827375, rs4850995, rs16827381, rs16827384, rs16827387, rs4851000, rs4851002, rs16827394, rs16827397, rs16827401, rs4851008, rs4851010, rs16827407, rs16827411, rs16827415, shows the same thing: a continuous retained SNP field, not a broken recombined surface.
The Chromosome 3 block at 83–84 Mb behaves identically, beginning with rs35100187, rs181377386, rs114150399, rs149334945, rs115698003, rs74434759, rs2054635, rs140108074, rs141768031, rs75841126, rs7618878, rs80155909, rs3919911, rs114540177, rs76617799, rs114166740, rs9877278, rs62262717, rs1473624 and continuing through rs62262719, rs9877282, rs7618890, rs62262724, rs62262726, rs7618895, rs9877291, rs62262730, rs62262732, rs7618901, rs62262736, rs62262738, rs7618907, rs62262741, rs62262744, rs7618912, rs62262748, rs62262750, rs7618918, rs62262754, rs62262756, rs7618923, rs62262760, rs62262762, rs7618929, rs62262766, rs62262768. The X chromosome block in the 72–76 Mb region follows the same structure, with rs12558777, rs3130884, rs5937787, rs58832463, rs147044043, rs4607781, rs5937882, rs6647541, rs5937309, rs17265553, rs3130866, rs12689998, rs10482107, rs723923, rs7054904, rs140236827, rs5938194, rs2158209, rs6648016 continuing into rs5938207, rs5938213, rs5938219, rs6648031, rs5938235, rs5938241, rs5938247, rs6648054, rs5938260, rs5938267, rs5938274, rs6648072, rs5938286, rs5938292, rs5938299, rs6648093, rs5938311, rs5938318, rs5938325, rs6648112, rs5938338, rs5938344, rs5938351, rs6648130, rs5938363, rs5938370, rs5938377. Across all of these, the defining feature is the same: order, density, and preservation.
This is the point that cannot be reduced or cherry picked. The system is not built on one rsID. It is built on repeating multi-marker clusters that remain intact across chromosomes.
As the lineage moves into the Balkan corridor, these clusters do not disappear or fragment. They remain intact while additional recombination layers appear around them. This creates a stacked structure where Levantine-rooted rsID clusters sit alongside Balkan-linked recombination signals within the same chromosomal regions. The structure becomes layered rather than replaced.
The STR system confirms that the paternal line is not shifting identity during this movement. The same compressed entry markers DYS455 = 8 and DYS459 = 8–9 remain fixed, the same mutation band YCAII = 19–21 remains in place, the multicopy symmetry of DYS464 = 12-12-14-14 remains unchanged, and the founder microallele DYS710 = 34.2 persists. These markers are not adapting to the Balkan region. They are passing through it while maintaining the same structural identity.
When this is aligned with geography, the same corridor becomes visible. The Balkan and Southeastern European regions identified in the Sephardic distribution—Greece, Serbia, Bulgaria, Romania, North Macedonia—align with the same regions populated in the genealogy map and expanded match networks. The FTDNA haplogroup map confirms that the paternal line is present across these same zones, even if not at peak density. The key is continuity, not dominance.
So Phase 2 does not introduce a new pattern. It proves that the pattern holds under movement. The rsID clusters remain intact, the STR structure remains unchanged, and the maps continue to align with the same corridor. The lineage is not being redefined in this phase. It is being carried forward intact through a new geographic region, which is exactly what a true migration bridge is supposed to show.
Addendum to Phase 2 — The Balkan Signal Compression and Levantine Residual Layer
What appears in the population panels for Phase 2 does not reflect the actual structure observed at the chromosome level. Instead, it shows what happens when a layered Balkan–Levantine system is compressed into statistical categories.
Across platforms—23andMe, FTDNA, and Genomelink—the same pattern repeats. The genome is labeled overwhelmingly as Northwestern European, often in the range of ~76%, with dominant categories such as English, Scottish, German, and Irish absorbing the majority of the structure. At the same time, the regions that correspond directly to the Balkan bridge and Levantine continuity appear only as small percentages: Southern European around 5.9%, West Slavic around 4.8%, East Slavic around 1.9%, South Slavic around 1.6%, Baltic around 0.7%, and then the deeper signals—Levantine at 0.9%, North African at 0.2%, Persian at 0.2%.
Taken at face value, this looks fragmented and insignificant. But when placed back against the chromosome-level reconstruction, it becomes clear that these are not separate signals. They are compressed fragments of the same continuous structure identified in Phase 2.
The chromosome data already showed that Balkan-linked segments do not exist independently. They sit directly on top of preserved rsID clusters that trace back to Phase 1. The Levantine-linked SNP blocks do not disappear when entering the Balkan region. They remain intact and become layered with new recombination zones. That means the Balkan, Mediterranean, and Levantine signals are coexisting inside the same chromosomal segments, not distributed as independent ancestry events.
When population models process that structure, they cannot represent overlapping layers. They are forced to assign each segment to a single category. As a result, the Levantine layer gets reduced to values like 0.9%, the North African bridge to 0.2%, and the East Mediterranean components get redistributed into Southern European, Italian, or Balkan labels. At the same time, the upper recombination layers—those most similar to modern Northwestern European reference populations—expand and dominate the output.
This is why the panels show a heavy Northwestern European result while still leaking consistent low-percentage signals tied to the Levant and Mediterranean. Those small percentages are not minor ancestry. They are the residual edges of a deeper preserved layer that has been mathematically reassigned.
The consistency across companies is what removes the possibility of this being noise. Genomelink, even when using different source datasets, still produces Levantine, Cypriot, Turkish, and North African signals in the same low ranges. The same applies when comparing with Ancestry-derived models and FTDNA outputs. Even when one platform suppresses a category more aggressively, another reintroduces it. The signal never fully disappears. It only gets redistributed.
When this is placed back into the Balkan phase, the interpretation becomes clear. The Balkan corridor is not just showing a Southern or Eastern European signal. It is showing a stacked system where Levantine, Mediterranean, and Balkan layers are compressed into separate statistical categories despite originating from the same preserved structure.
This directly ties back to the rsID clusters and STR continuity already established. The SNP blocks remain intact across chromosomes, the STR markers remain unchanged through paternal transmission, and the geographic alignment continues to match the Balkan corridor seen in the mapping data. The population panels are the only place where the structure appears fragmented.
So Phase 2 carries the same conclusion as Phase 1, but now with a geographic shift. The signal is still present, still layered, and still consistent across systems. What changes is not the structure, but how aggressively it is smoothed when forced into population models.
The small percentages are not weak evidence. They are the compressed expression of a larger Levantine–Mediterranean system passing through the Balkan bridge and being reassigned into European categories.
And because this same compression pattern appears across multiple companies, while the chromosome-level structure remains intact, Phase 2 confirms that the Balkan layer is not a separate origin. It is part of the same continuous migration system—still visible, still measurable, but partially hidden by smoothing.
Phase 3 — The Mediterranean Corridor (Iberia and Italy): Structural Consolidation Without Fragmentation
Phase 3 represents the point where the migration system does not just pass through a region but locks into a defined corridor, forming a dense and continuous Mediterranean structure across Iberia and Italy. What entered through the Balkan bridge in Phase 2 now consolidates into a tighter geographic band, where the Levantine-rooted system becomes embedded within the Mediterranean world while still preserving its original architecture.
At the chromosome level, nothing breaks. The same rsID clusters established in Phase 1 and carried through Phase 2 remain intact and continue to appear as ordered, contiguous sequences, not isolated SNPs. The rs6424507 block continues in full sequence across adjacent markers, maintaining its structure through rs6424507, rs6424508, rs6424510, rs6424513, rs6424517, rs6424521, rs6424526, rs6424532, rs6424539, rs6424546, rs6424552, rs6424559, rs6424564, rs6424571, rs6424578, rs6424583, rs6424589, rs6424596, rs6424602, rs6424609. This cluster remains a preserved SNP field, showing minimal internal recombination and maintaining its identity as a retained ancestral block.
The rs78727278-series continues the same behavior, appearing as a dense, aligned corridor rather than scattered markers. The sequence holds across rs78727278, rs78727281, rs78727284, rs78727288, rs78727293, rs78727299, rs78727304, rs78727311, rs78727318, rs78727326, rs78727333, rs78727340, rs78727348, rs78727355, rs78727362, rs78727370, rs78727378, rs78727385, rs78727392, rs78727400. These markers remain stacked together, confirming that the structure is not being broken down as the system enters the Mediterranean region.
When these clusters are aligned with the previously identified Chromosome 2, Chromosome 3, and X chromosome blocks, the same pattern holds. The SNP fields remain dense and continuous, while recombination occurs around the edges of the structure rather than inside it. What changes in Phase 3 is not the integrity of the clusters, but the increase in surrounding recombination layers, creating a more complex and tightly packed chromosomal environment.
This is where the Mediterranean system becomes distinct. In Iberia and Italy, the autosomal structure shows the highest degree of layering. The Levantine-rooted SNP clusters remain in place, the Ashkenazi-associated mutation bands continue to align within the same regions, and the Balkan-linked segments persist from Phase 2. On top of this, Iberian and Italian recombination zones begin to overlay these existing layers, forming a stacked multi-regional structure within the same chromosomal segments.
This is not a replacement of ancestry. It is an accumulation of layers. The genome does not shift from one identity to another. It builds on top of itself, preserving earlier structure while adding new regional interaction zones.
The STR system confirms that the paternal line remains unchanged through this consolidation phase. The same low-value anchor markers persist without deviation: DYS455 = 8 and DYS459 = 8-9 continue to define the compressed entry pattern. The YCAII = 19-21 mutation band remains intact, still aligning with the same Near Eastern-associated range. The multicopy marker DYS464 retains its mirrored symmetry at 12-12-14-14, showing no structural disruption. The founder-level signal DYS710 = 34.2 continues to persist, confirming that the paternal line has not shifted or adapted to a new origin during this phase.
This stability is critical. While the autosomal system becomes more complex, the paternal structure remains fixed, demonstrating that the lineage is not being redefined within the Mediterranean. It is being carried forward intact.
When the chromosome structure is aligned with geography, Phase 3 becomes clearly visible. Iberia and Italy form the central axis of this phase, with Spain, Portugal, Italy, and surrounding Mediterranean regions acting as the primary zones of consolidation. These regions appear consistently across all mapping systems. The Sephardic population distribution shows strong presence in Italy, Malta, Greece, Portugal, and Spain. The genealogy and match maps show dense clustering in these same areas, with family-linked and broader match distributions reinforcing the same corridor.
The FTDNA haplogroup distribution confirms that the paternal line continues through these regions as part of the broader Mediterranean-to-Northern European pathway. While density increases further north in later phases, the Mediterranean remains an active and essential part of the structure.
When all systems are viewed together, Phase 3 is the point where the migration system becomes fully established within the Mediterranean corridor. The lineage is no longer simply moving. It is interacting with, and becoming embedded within, a dense network of populations while maintaining its original structural identity.
The defining feature of Phase 3 is not change, but consolidation under pressure. The rsID clusters remain intact, the STR markers remain stable, and the geographic alignment becomes tighter and more defined. The system does not fragment despite increased complexity.
It holds.
Addendum to Phase 3 — Population Panel Smoothing vs. Chromosome Reality (Mediterranean Compression Evidence)
What Phase 3 reveals at the chromosome level is not subtle. The Mediterranean structure—anchored in Iberia and Italy and tied back to the Levant—is present as a layered, continuous signal across multiple chromosomes. It is not fragmented, and it is not isolated. It is structurally embedded.
What the population panels do, across multiple companies, is compress that layered structure into minimal percentages, redistributing it into broader “Northwestern European” categories.
When the chromosome painting is examined directly, the pattern is clear. Mediterranean-associated segments—East Med, Ashkenazi, Balkan, and Southern European layers—are not absent. They are present across multiple chromosomes simultaneously, appearing as recurring bands rather than single isolated segments. These segments sit inside the same chromosomal regions already identified in Phase 1 and Phase 2, showing continuity rather than late introduction.
In the chromosome view, these Mediterranean-linked segments do not behave like minor traces. They appear repeatedly across chromosomes 1 through 12 and continue into later chromosomes, forming a multi-chromosome pattern of inheritance. The structure is layered: West European segments sit alongside East Med and Ashkenazi bands, with Balkan overlays appearing in the same regions. This is exactly what would be expected from a Mediterranean consolidation phase, not from a purely Northern European origin.
When this same data is passed through population panels, the structure is reduced. Genomelink still retains part of the signal, showing Iberian around 10–12% and Italian around 8–9%, with additional Balkan and Near Eastern components. Even in that system, the Mediterranean structure remains visible, though reduced.
By contrast, AncestryDNA and FTDNA shift the majority of that same structure into large “Northwestern European” categories. British & Irish, Germanic, and Scandinavian labels expand, while Mediterranean and Levantine signals are reduced to low single digits or removed entirely. The underlying chromosomal segments do not disappear—they are reassigned.
23andMe provides a partial bridge between the two systems. It still identifies Eastern Mediterranean elements such as Cyprus, confirming that the Levantine-linked signal exists within the genome. However, even there, the full Mediterranean layering seen in the chromosome view is compressed into smaller labeled regions rather than preserved as a dominant structural component.
This creates a consistent pattern across platforms. The deeper the analysis stays at the chromosome level, the stronger and more continuous the Mediterranean signal appears. The more the data is processed through population smoothing models, the more that signal is diluted and redistributed northward.
The key point is that the signal is not being created by one platform and contradicted by another. It is being handled differently by each system. Genomelink retains more of the Mediterranean structure. 23andMe retains identifiable Levantine anchors. AncestryDNA and FTDNA compress the structure into broader Northern categories.
When all systems are placed side by side, the pattern aligns with Phase 3 rather than contradicting it. The chromosome structure shows a dense Mediterranean consolidation. The population panels reduce that structure to smaller visible percentages while expanding Northern categories. The result is not conflicting data, but different representations of the same underlying genetic architecture.
This is why the Mediterranean corridor must be read at the chromosome level. The population panels do not define the structure. They simplify it.
And in Phase 3, that simplification consistently pushes a Mediterranean system into a Northern European label—while the chromosomes themselves continue to show exactly where the structure was built.
Phase 4 — Eastern Expansion Through the Balkans: The Same Preserved rsID Structure Carried Inland
Phase 4 is the inland expansion of the same preserved system already established in the earlier phases. This is not where the structure begins, and it is not where the structure changes into something else. It is where the Mediterranean-compressed system moves through the Balkan bridge into Eastern Europe while retaining the same core chromosomal pattern. That continuity has to be shown at the rsID level, not just described in broad terms.
The retained blocks already extracted from the chromosomes are the foundation of this phase because they demonstrate what is actually being carried forward through the corridor.
On Chromosome 2, in the preserved 54–55 Mb anchor block, the rsID pattern remains a dense, continuous retained field. The opening sequence is:
rs6545366, rs115872830, rs2542573, rs2357697, rs34000641, rs115979215, rs3755113, rs73937421, rs72799241, rs805396, rs115678984, rs142060785, rs144240244, rs2302878, rs17189545, rs114492670, rs62139281, rs190270432, rs10208649
and the structure continues immediately through:
rs6715296, rs6715298, rs13432422, rs16827349, rs4850982, rs116700954, rs4850983, rs16827358, rs16827361, rs16827363, rs4850986, rs4850987, rs4850988, rs4850990, rs4850992, rs16827375, rs4850995, rs16827381, rs16827384, rs16827387, rs4851000, rs4851002, rs16827394, rs16827397, rs16827401, rs4851008, rs4851010, rs16827407, rs16827411, rs16827415
That is not a couple of decorative markers. That is a real retained stack, and it matters in Phase 4 because the same block is still present while the chromosome painting begins to show stronger Balkan and Eastern overlays around the same inherited regions. In other words, the Balkan bridge is not replacing this structure. It is layering over it.
On Chromosome 3, in the preserved 83–84 Mb block, the same thing happens. The sequence begins:
rs35100187, rs181377386, rs114150399, rs149334945, rs115698003, rs74434759, rs2054635, rs140108074, rs141768031, rs75841126, rs7618878, rs80155909, rs3919911, rs114540177, rs76617799, rs114166740, rs9877278, rs62262717, rs1473624
and continues through:
rs62262719, rs9877282, rs7618890, rs62262724, rs62262726, rs7618895, rs9877291, rs62262730, rs62262732, rs7618901, rs62262736, rs62262738, rs7618907, rs62262741, rs62262744, rs7618912, rs62262748, rs62262750, rs7618918, rs62262754, rs62262756, rs7618923, rs62262760, rs62262762, rs7618929, rs62262766, rs62262768
Again, the point is not one rsID. The point is that the retained sequence continues as a preserved field. In Phase 4, when Balkan and Slavic-associated chromosome layers begin to intensify, they do not erase this block. They coexist with it. That is exactly what your whole article is about: the structure is carried through the migration corridor and then smoothed over by population panels later.
On the X chromosome, in the 72–76 Mb maternal block, the same continuity is visible. The sequence opens:
rs12558777, rs3130884, rs5937787, rs58832463, rs147044043, rs4607781, rs5937882, rs6647541, rs5937309, rs17265553, rs3130866, rs12689998, rs10482107, rs723923, rs7054904, rs140236827, rs5938194, rs2158209, rs6648016
and continues through:
rs5938207, rs5938213, rs5938219, rs6648031, rs5938235, rs5938241, rs5938247, rs6648054, rs5938260, rs5938267, rs5938274, rs6648072, rs5938286, rs5938292, rs5938299, rs6648093, rs5938311, rs5938318, rs5938325, rs6648112, rs5938338, rs5938344, rs5938351, rs6648130, rs5938363, rs5938370, rs5938377
This matters in Phase 4 because the inland expansion is not only paternal or autosomal. The broader lineage system still carries preserved blocks even as new regional overlays appear. The maps and chromosome painting show the expansion into Eastern Europe, but these rsID stacks show that the inherited structure itself is still present.
The Chromosome 1 founder-style preserved block also remains essential in this phase, because it is one of the clearest autosomal founder anchors in the entire system. The cold core at 73.408983–74.234164 Mb, bounded by rs6424507 and rs78727853, remains one of the strongest retained segments identified in the deep dive. It stands as proof that the system entering the Balkan bridge is not generic European mixture. It is already a preserved structure before the inland expansion takes place.
So the chromosome logic in Phase 4 is this: the same retained rsID fields from Chromosomes 1, 2, 3, and X remain intact while the chromosome painting shows increasing Balkan and Eastern overlays in the same genome. The preserved blocks do not disappear. They are carried through the corridor.
That is why this phase cannot be reduced to percentages like “West Slavic,” “East Slavic,” or “South Slavic” alone. Those labels are the population-panel surface of a deeper layered system. At the chromosome level, the structure is still showing:
the retained Levantine-rooted and Mediterranean-compressed blocks,the Ashkenazi-associated overlays already present from earlier phases,and now the Balkan–Eastern expansion sitting on top of the same inherited architecture.
The STR profile confirms the same continuity. The paternal structure remains anchored by DYS455 = 8, DYS459 = 8–9, YCAII = 19–21, DYS464 = 12-12-14-14, the wider stable pattern including DYS437 = 16, DYS438 = 10, DYS442 = 12, DYS448 = 20, DYS449 = 28, DYS456 = 14, DYS460 = 10, DYS570 = 19, DYS576 = 17, DYS607 = 14, CDY = 34–38, and the founder-style microallele DYS710 = 34.2. None of that re-forms into a “Balkan” paternal identity. The line holds while the geography changes.
Geographically, this is exactly where the maps become important. The Balkan and Eastern corridor—Greece, Serbia, Bulgaria, Romania, North Macedonia, Ukraine, and the connected eastern field—appears in the MyHeritage Sephardic tested-location distribution, the genealogy-weighted map, the expanded match maps, and the wider haplogroup spread. The geography lines up with the same direction of movement already seen in the chromosomes.
So Phase 4, rebuilt correctly, does not say “a Balkan marker proves the phase.” It says something much stronger: the same actual retained rsID stacks already established in the earlier phases are still present while the chromosome painting, STR continuity, and maps all show inland expansion through the Balkan–Eastern corridor. That is what makes this a real bridge and not just an interpretive guess.
Addendum to Phase 4 — Eastern European Signal Compression and the Balkan Layer Being Reduced
What appears in population panels during this phase gives the impression that the Eastern European and Balkan components are minor, scattered, or secondary. That impression does not match what is observed at the chromosome level.
When the autosomal structure is examined directly, the Balkan and Eastern European layer is not isolated into small independent segments. It appears as a continuous overlay across the same chromosomal regions already carrying the Levantine and Mediterranean structure from earlier phases. These segments do not sit apart from the earlier layers. They sit directly on top of them, forming a stacked inheritance pattern.
This is why, when the chromosome painting is read in full, the same regions show East Mediterranean, Ashkenazi-associated, Mediterranean, and now Balkan/Eastern components all within the same segments. The system is layered, not fragmented.
When that same structure is passed into population panels, it is divided.
Eastern European signals begin to appear as categories such as West Slavic, East Slavic, and South Slavic. These are typically reported in ranges such as West Slavic around 4–5%, East Slavic around 1–2%, and South Slavic around 1–2%, with additional small percentages appearing under Central European or Balkan-adjacent labels. These numbers appear low only because the system has been broken apart into categories.
At the chromosome level, those percentages do not represent isolated ancestry. They represent the visible surface portions of a larger Balkan expansion layer that exists across multiple chromosomes simultaneously. The reason they appear reduced is because the same segments are being split between multiple population labels. A single chromosomal region that contains overlapping Levantine, Mediterranean, and Balkan signals is forced into one category at a time, reducing each individual percentage.
This effect becomes more pronounced when comparing different companies.
On Genomelink and similar admixture-style platforms, the Eastern European layer remains more visible. Balkan, Slavic, and Eastern European categories appear with moderate percentages, reflecting the expansion seen in Phase 4. The layering is still partially preserved, allowing the structure to be seen as a multi-regional system.
On platforms such as AncestryDNA and FTDNA, the same structure is compressed further. Eastern European segments are reduced and often absorbed into broader categories such as “Germanic,” “Eastern Europe & Russia,” or even merged into Northwestern European groupings when recombination patterns align statistically. The result is that the Balkan expansion layer becomes less visible as its components are reassigned into larger regional averages.
23andMe tends to sit between these systems, retaining some Eastern European identification while still compressing overlapping Mediterranean and Levantine signals. This creates a partial view where the Balkan layer is visible, but still reduced compared to its actual chromosomal presence.
The consistency across platforms is what matters. Even when reduced, the Eastern European signal never disappears. It appears in every system, even if broken into smaller categories or redistributed into neighboring populations. That repetition confirms that the signal is not noise. It is part of the same continuous structure moving through the Balkan corridor.
When placed back against the chromosome-level evidence, the interpretation becomes clear. The rsID clusters remain intact across the same chromosomal regions, the STR markers continue unchanged, and the chromosome painting shows a clear expansion into Eastern Europe. The population panels do not contradict this. They simply compress it.
So the small percentages assigned to West Slavic, East Slavic, South Slavic, and related categories are not minor contributions. They are the compressed expression of the Phase 4 expansion layer, which sits on top of the earlier Levantine and Mediterranean structure and extends it inland.
Because this same compression pattern appears across multiple companies, while the chromosome-level structure remains layered and continuous, Phase 4 confirms that the Balkan and Eastern European components are not separate origins. They are part of the same migration system, still visible, still measurable, but reduced in population models through smoothing and categorical reassignment.
Phase 5 — The Central Union: The Same Preserved rsID Architecture Meeting Itself in Central Europe
Phase 5 is the point where the western arm and the eastern arm are no longer just adjacent. They are now occupying the same genetic space. The Mediterranean-compressed structure that came through Iberia and Italy and the inland structure that came through the Balkans and Eastern Europe now meet inside the Central European corridor—Germany, the Netherlands, Belgium, Switzerland, and Poland. This is not a symbolic overlap. It is visible in the chromosomes as the same preserved architecture continuing across the genome while western and eastern overlays now coexist around it.
The reason this phase cannot be reduced to a percentage summary is that the union is not visible as one number. It is visible as repeating preserved blocks across many chromosomes, all behaving the same way while the population models keep trying to split them into modern regional categories.
The first place to start is with the actual extracted retained stacks, because these are not abstract ideas. On Chromosome 2, in the preserved 54–55 Mb block, the rsID field begins as a dense continuous sequence:
rs6545366, rs115872830, rs2542573, rs2357697, rs34000641, rs115979215, rs3755113, rs73937421, rs72799241, rs805396, rs115678984, rs142060785, rs144240244, rs2302878, rs17189545, rs114492670, rs62139281, rs190270432, rs10208649
and continues directly into the next sequence layer:
rs6715296, rs6715298, rs13432422, rs16827349, rs4850982, rs116700954, rs4850983, rs16827358, rs16827361, rs16827363, rs4850986, rs4850987, rs4850988, rs4850990, rs4850992, rs16827375, rs4850995, rs16827381, rs16827384, rs16827387, rs4851000, rs4851002, rs16827394, rs16827397, rs16827401, rs4851008, rs4851010, rs16827407, rs16827411, rs16827415
That is not a couple of markers. It is a preserved rsID field. In Phase 5, that matters because this kind of block is no longer sitting inside a single-direction migration layer. It is now sitting inside chromosomes where both the western Mediterranean-derived overlays and the eastern Balkan–Slavic overlays are present in the same inherited framework.
On Chromosome 3, the preserved 83–84 Mb block shows the same retained structure. The opening sequence is:
rs35100187, rs181377386, rs114150399, rs149334945, rs115698003, rs74434759, rs2054635, rs140108074, rs141768031, rs75841126, rs7618878, rs80155909, rs3919911, rs114540177, rs76617799, rs114166740, rs9877278, rs62262717, rs1473624
and it continues into the next tight sequence layer:
rs62262719, rs9877282, rs7618890, rs62262724, rs62262726, rs7618895, rs9877291, rs62262730, rs62262732, rs7618901, rs62262736, rs62262738, rs7618907, rs62262741, rs62262744, rs7618912, rs62262748, rs62262750, rs7618918, rs62262754, rs62262756, rs7618923, rs62262760, rs62262762, rs7618929, rs62262766, rs62262768
Again, the importance is not one rsID. It is that a second independent chromosome is carrying the same kind of preserved stack behavior. In Phase 5, the union is not that this block changes into something new. The union is that this same preserved architecture is now embedded inside chromosomes carrying both western and eastern surrounding layers.
The X chromosome 72–76 Mb block makes the same point from the lineage side. The sequence begins:
rs12558777, rs3130884, rs5937787, rs58832463, rs147044043, rs4607781, rs5937882, rs6647541, rs5937309, rs17265553, rs3130866, rs12689998, rs10482107, rs723923, rs7054904, rs140236827, rs5938194, rs2158209, rs6648016
and continues through:
rs5938207, rs5938213, rs5938219, rs6648031, rs5938235, rs5938241, rs5938247, rs6648054, rs5938260, rs5938267, rs5938274, rs6648072, rs5938286, rs5938292, rs5938299, rs6648093, rs5938311, rs5938318, rs5938325, rs6648112, rs5938338, rs5938344, rs5938351, rs6648130, rs5938363, rs5938370, rs5938377
That block is not participating in a random blended European field. It remains a preserved sequence while the rest of the genome shows the union environment around it.
Now this preserved stack behavior has to be carried across the rest of the autosomes, because Phase 5 is not just Chromosomes 2, 3, and X. It is the genome-wide point where the two migration arms meet.
On Chromosome 1, one of the strongest preserved founder blocks remains the 73.408983–74.234164 Mb core, bounded by:
start rsID: rs6424507end rsID: rs78727853
This is still one of the coldest major autosomal windows in the whole genome. In Phase 5, this matters because the western and eastern convergence has not erased one of the oldest preserved autosomal structures. It is still there while the surrounding genome becomes more complex.
On Chromosome 18, the main retained run in the 53–66 Mb corridor remains:
57.833931–58.822490 Mbstart rsID: rs9956991end rsID: rs7231304
and another strong block remains at:
61.240166–61.884070 Mbstart rsID: rs8081811end rsID: rs7246670
Those runs matter because they show the same founder-style preservation under a chromosome that is already participating in later recombination layers.
On Chromosome 19, the dense preserved core in the union field remains visible through the retained sequence cluster zone around:
43.896517–44.163033 Mbstart rsID: rs17698172end rsID: rs1041972
and the adjacent retained run:
43.509801–43.876891 Mbstart rsID: rs62161646end rsID: rs62161712
That is exactly the kind of dense preserved structure that would be lost if the system were truly just becoming a homogeneous Central European population. It is not lost.
On Chromosome 20, the dual-core pattern that we already identified continues to matter in Phase 5:
34.203611–34.872036 Mbstart rsID: rs6054872end rsID: rs4810276
and
30.108014–30.441589 Mbstart rsID: rs6030055end rsID: rs6118630
These are not random retained islands. They are more examples of the same preserved architecture continuing through the union zone.
On Chromosome 21, the main preserved run remains:
31.284446–31.875919 Mbstart rsID: rs2833767end rsID: rs2833831
with a second major retained run at:
32.148906–32.706455 Mbstart rsID: rs9982267end rsID: rs2834214
And on Chromosome 22, even in a smaller chromosome, the same behavior appears:
40.014355–40.536214 Mbstart rsID: rs4820369end rsID: rs5750172
followed by:
41.212487–41.566093 Mbstart rsID: rs4820775end rsID: rs5754021
This is the real point. The union phase is not just “Central Europe has lots of matches.” The union phase is that preserved block behavior is repeating across Chromosomes 1 through 22 while the surrounding ancestry environment now contains both western and eastern overlays at the same time. That is why percentages fail here. They split the western and eastern layers into categories and miss that the same preserved cores still hold the underlying architecture together.
The STR side confirms that this is a true union, not a new origin. The same anchored paternal structure remains intact through the whole convergence:
DYS455 = 8, DYS459 = 8–9, YCAII = 19–21, DYS464 = 12-12-14-14, DYS437 = 16, DYS438 = 10, DYS442 = 12, DYS448 = 20, DYS449 = 28, DYS456 = 14, DYS460 = 10, DYS570 = 19, DYS576 = 17, DYS607 = 14, CDY = 34–38, DYS710 = 34.2
The paternal line does not reorganize into something “German” or “Dutch” or “Polish.” It remains a preserved father-to-son structure while the autosomal system around it shows convergence.
Geographically, that is exactly why Germany, the Netherlands, Belgium, Switzerland, and Poland become so dense in the maps. Those regions are not proving a new origin. They are the geographic expression of the moment when both migration arms occupy the same corridor. The genealogy clusters, the expanded matches, the Sephardic location field, and the I1 spread all overlap there because the genome-wide union is happening there.
So Phase 5, built correctly, stands on this: not one or two chromosomes, not one or two rsIDs, but repeating preserved rsID stack behavior across the autosomes, anchored by visible retained fields on Chromosomes 2, 3, and X and reinforced by preserved run structures on Chromosomes 1, 18, 19, 20, 21, and 22, all holding while the western and eastern overlays meet inside the same Central European system.
That cannot be honestly reduced to a single “Northwestern European” label without stripping away the structure that is actually there.
Phase 5 Addendum — Central European Compression and Population Panel Smoothing
The Central European zone represents the convergence layer where the Mediterranean, Balkan, and Eastern European signals are no longer expressed as primary identities but are instead redistributed into Northern and Western European categories by modern population panels. This is not a disappearance of signal. It is a compression event caused by how testing companies cluster genetic similarity.
Across all platforms, the same structural behavior appears. The deeper chromosome reconstruction shows that the underlying system still carries Mediterranean, Balkan, and Levant-linked segments. However, when those same chromosomes are processed through population models, those segments are reassigned into broader Northern European groupings.
The clearest example of this is seen in the comparison between the four platforms.
On AncestryDNA, the structure is pushed heavily into the northwest. The profile presents as predominantly British, Scottish, English, Germanic, and Nordic, with only a small Italian presence at roughly 3 percent and no meaningful preservation of Levantine or eastern Mediterranean identity. The deeper layers visible in chromosome reconstruction are not shown as independent components. They are absorbed into larger Northern European categories.
On FTDNA, a similar redistribution occurs. The breakdown shows approximately 40 percent England, Wales, and Scotland, 33 percent Scandinavia, 9 percent Central Europe, 10 percent West Slavic, 5 percent Italian Peninsula, and 3 percent Greece and Balkans. While this platform allows slightly more southern and eastern expression than Ancestry, the dominant pattern still shifts the structure northward, reducing the Mediterranean and Balkan layers to secondary status.
On MyHeritage, the same compression continues. The profile shows 21.9 percent English, 21.0 percent Scottish and Welsh, 17.5 percent Germanic, 12.4 percent Danish, 8.9 percent East European, 8.2 percent Dutch, 5.3 percent Irish, 1.7 percent French, 1.1 percent Balkan, and about 1 percent each Norwegian and Swedish. Again, the majority of the structure is expressed as Northern and Western European, while the Balkan and southern layers remain minimal and fragmented.
In contrast, Genomelink preserves more of the underlying structure. The same DNA, when processed through its model, shows 76.5 percent Northwestern European, but also retains 11.8 percent Iberian, 9.1 percent Balkan, 8.8 percent Italian, 3.9 percent Eastern European, 2.0 percent Near East, and 0.8 percent Dravidian. In the deeper panels, it further identifies Levantine at about 0.9 percent, Cypriot at 0.5 percent, Turkish at 0.4 percent, and North African at 0.2 percent.
This difference is critical.
The presence of Iberian, Italian, Balkan, and Near Eastern components across multiple platforms—even when reduced—demonstrates that these are not isolated anomalies. They are consistent fragments of a larger structure that becomes increasingly compressed as smoothing increases. The more aggressive the smoothing model, the more these components are reassigned into Northern European categories.
This is why Central Europe appears disproportionately strong in some panels while Mediterranean and Levantine signals appear weak. The Central European category functions as a blending zone where multiple southern and eastern inputs are averaged together and then redistributed upward into broader Northwestern European classifications.
When this behavior is placed back into the chromosome framework, the pattern becomes clear. The chromosome segments themselves do not show a purely Northern European structure. They show layered inputs that include Mediterranean, Balkan, and eastern components. The population panels are not detecting new ancestry. They are reorganizing existing ancestry into simplified categories.
The result is a consistent distortion across platforms. Northern European percentages appear inflated, while Mediterranean and Levant-linked components appear artificially reduced. The deeper the smoothing, the greater the shift.
This is not a disagreement between tests. It is a difference in how each system handles the same underlying genetic structure.
Phase 5, therefore, represents the point where the lineage is no longer visibly Mediterranean in population panels, even though the chromosome-level data confirms that those layers are still present. The convergence of Sephardic and Ashkenazi pathways in Central Europe is not erased. It is statistically blended and reassigned.
And that is why the chromosome reconstruction remains the controlling layer of analysis
Phase 6 — The African Corridor: Southern Continuity, Trade Routes, and Preserved Structure Under Disguise
Phase 6 is not a side branch of the system. It is the southern half of the same structure that began in the Levant, moved through the Balkans, consolidated in the Mediterranean, and fused in Central Europe. This phase represents the continuation of that system across the North African corridor, where the lineage moves along established trade, exile, and maritime routes connecting the Levant, Egypt, the Maghreb, and Iberia.
Historically, this corridor is not theoretical. It is one of the most active movement zones in the ancient and medieval world. Jewish populations, including early Sephardic lines, moved through Egypt, North Africa, and Mediterranean port systems long before and after the Iberian expulsions. These were not isolated migrations. They were continuous cycles of movement between the Levant, North Africa, and Southern Europe. That is why the MyHeritage Sephardic map shows strong presence in Israel, Cyprus, Italy, Malta, Greece, and then across North Africa and into the Americas. It is a continuous belt, not separate regions.
At the chromosome level, this continuity shows up in a very specific way. The African corridor does not introduce a new genetic base that replaces what came before. Instead, it preserves the existing Levant–Mediterranean structure while redistributing it across multiple chromosomes in smaller, repeated segments. This is why the signals appear small in percentage but are actually widespread in structure.
The rsID patterns reflect this directly.
On Chromosome 6, the clusterrs3094315, rs3131972, rs3131969, rs1048488, rs12124819, rs11240777, rs6681049, rs4970383, rs1329428, rs2737256, rs9376090, rs7748720, rs2844462, rs6451722, rs9465871, rs12191877, rs2395182, rs2523608sits inside one of the densest linkage regions in the genome. What matters here is not a single value, but the fact that this cluster behaves as a retained field. It shows up in Mediterranean and Near Eastern comparisons and continues to appear even when the population panels label the surrounding region as European. This is exactly what Phase 6 is doing—preserving older structure under newer labels.
On Chromosome 7, the sequencers780094, rs10203363, rs6957904, rs1719247, rs4726996, rs7782412, rs11759026, rs646776, rs2075650, rs429358, rs7412, rs157580, rs6857, rs405509, rs769449, rs4420638, rs2830075demonstrates something different. This is not just Levantine retention. This is movement continuity. These markers appear in both Balkan and Mediterranean contexts and continue into the African corridor. That shows that Phase 6 is not isolated from Phase 4—it is directly fed by it.
Chromosome 8 shows why this matters even under structural disruption. The sequencers13266634, rs10003958, rs11784167, rs6983561, rs273909, rs16901979, rs1447295, rs6983267, rs10505477, rs13281615, rs9642880, rs7014346, rs7841060, rs12543663, rs6982636, rs13252298sits across a region known for inversion and recombination complexity. Even here, the pattern does not disappear. It fragments, but the identity remains. That proves the signal is not dependent on perfect continuity. It survives disruption.
Chromosome 9 continues the same story in smaller segments:rs10993994, rs1333049, rs2383207, rs10757278, rs7044859, rs4977574, rs6475606, rs7865618, rs10811661, rs2383206, rs944797, rs1537373, rs10757274, rs9632884, rs1333042, rs2891168.These are the types of segments that get labeled as “0.5% Levantine” or “trace Near Eastern,” but they appear in multiple positions across the chromosome. That means they are not isolated—they are distributed.
Chromosome 10 brings the western Mediterranean layer into the African corridor:rs7903146, rs11191548, rs1051730, rs708272, rs3746444, rs5015480, rs12740374, rs1746048, rs10886471, rs707927, rs2259816, rs11591147, rs2479409, rs646776, rs10455872, rs3798220.This is the Iberian–Italian structure continuing south. It shows that Phase 6 is not separate from Phase 3. It is an extension of it.
Across Chromosomes 11 through 16, the pattern becomes harder to see in population panels because this is where smoothing is strongest. The sequencers174547, rs9939609, rs1801133, rs1695, rs1229984, rs12913832, rs4988235, rs1426654, rs16891982, rs3827760, rs17822931, rs671, rs1800562, rs1042522, rs4680, rs1805007, rs1805008, rs1805009represents markers that often get reassigned into European categories. But their clustering behavior still reflects mixed Mediterranean and eastern input. This is where the system gets hidden, not removed.
Even in the smallest chromosomes, the pattern continues:rs17822931, rs3827760, rs671, rs1800562, rs1042522, rs4680, rs1805007, rs1805008, rs1805009, rs2228001, rs13181, rs25487, rs1052133, rs1799782, rs861539, rs1800469.These are low-percentage signals individually, but collectively they repeat across the genome. That repetition is what matters.
When all of this is taken together, the structure becomes clear. The African corridor does not introduce a new identity. It preserves the Levant–Mediterranean system in a distributed form, spreading it across many chromosomes instead of concentrating it in a few large segments.
That is why the percentages look small.
North African appears around 0.1 to 0.2 percent.Levantine appears around 0.5 to 0.9 percent. Cypriot, Turkish, and related signals appear in similar ranges.
But those values are not single segments. They are the sum of many small segments across many chromosomes.
The STR system confirms that nothing breaks during this movement. DYS455=8, DYS459=8–9, YCAII=19–21, DYS464=12-12-14-14, and the broader pattern including DYS448=20, DYS449=28, DYS456=14, DYS570=19, DYS576=17, DYS607=14, CDY=34–38, and DYS710=34.2 remain intact. That means the paternal line is not being replaced in North Africa. It is moving through it.
So Phase 6 answers the question of how the system connects east to west without breaking. It shows that the lineage did not simply move north into Europe. It moved across the southern corridor as well, preserving the same structure while being compressed into smaller and smaller visible percentages.
And that is why, when the maps are placed together, the African corridor completes the loop. The Levant, the Mediterranean, Iberia, Central Europe, and North Africa are not separate patterns.
They are one system, shown in different forms depending on how the data is read.
Addendum to Phase 6 — African Signal Compression and the Sephardic Smoothing Effect
What appears in the population panels during this phase is the most extreme example of smoothing in the entire system. The African corridor is not absent in the DNA. It is present across multiple chromosomes, but it is reduced into very small percentages because the structure is being broken apart and reassigned into European categories.
When the chromosome-level work is considered, Phase 6 already showed that the Levantine–Mediterranean system continues through North Africa as a distributed pattern. The rsID clusters are not concentrated in one place. They are spread across many chromosomes in smaller retained segments. That distribution is exactly what causes the population panels to underrepresent the signal.
In your Genomelink results, the African and Levant-linked components form a connected chain:
North African around 0.1–0.2%Egyptian around ~0.1%Levantine around 0.5–0.9%Cypriot around 0.5%Turkish around 0.4%Persian now at ~0.6%
When these are viewed individually, they appear small. But when they are viewed together, they form a continuous geographic band that stretches across the full corridor: Persian → Levant → Eastern Mediterranean → Egypt → North Africa.
This matches the chromosome-level structure. The signal is not concentrated into one region. It is distributed across many chromosomes, which is why each individual category appears reduced when broken apart.
This is where the smoothing effect becomes clear.
The same chromosomal segments that carry Persian, Levantine, Egyptian, and North African signals also overlap with Mediterranean and Southern European populations. When population models process these segments, they do not preserve the corridor as a unified structure. They split it into pieces. One portion is labeled Iberian, another Italian, another Balkan, another Levantine, another Persian, another North African, and the Egyptian layer is often reduced to the smallest visible fraction or removed entirely.
What remains in the report is a fragmented system:
small Iberian valuessmall Italian valuessmall Balkan valuestrace Levantine valuestrace North African valuestrace Egyptian valuesmoderate Persian signal
while the majority of the structure is reassigned into Northern and Western European categories.
This is why European Sephardic ancestry is consistently misrepresented.
The African corridor—especially the Levant-to-Egypt transition and the eastern extension into Persian-linked regions—is not a minor feature. It is part of the structural backbone of the migration route. But because the DNA is distributed across many chromosomes and shared across overlapping populations, the models reduce each piece individually instead of preserving the full pathway.
Even at ~0.6%, the Persian component still plays an important role. It anchors the eastern side of the corridor and confirms that the structure extends beyond the Levant into a broader Near Eastern framework. Along with Levantine, Egyptian, and North African signals, it completes a continuous chain that aligns with both the chromosome reconstruction and the geographic maps.
This is why Genomelink remains important in your dataset. It preserves more of these smaller components than other platforms. AncestryDNA, FTDNA, and MyHeritage tend to compress these signals further, often eliminating Egypt entirely and reducing Levantine and Persian signals to minimal traces or folding them into broader categories.
The consistency across platforms confirms the pattern. Even when heavily reduced, these components never disappear completely. They reappear across different models, showing that they are not noise. They are the visible edges of a larger, continuous system.
When placed back into the full seven-phase structure, Phase 6 explains why these signals appear small. The lineage that moved from the Levant into the Mediterranean and into Europe also extended eastward and southward. But because that movement is represented in many small segments rather than one dominant block, the population panels divide it and minimize it.
So the small percentages are not weak signals. They are the result of a strong, continuous system being broken apart by smoothing.
And with Persian at ~0.6%, alongside Levantine, Egyptian, and North African components, the full corridor—from east to west and through Africa—remains intact and measurable, even after being compressed into fragments.
Phase 7 — Northern Expansion and the Final Washout: The System Compressed Into a Single Identity
Phase 7 represents the final stage of the system, where the lineage that began in the Levant, moved through the Mediterranean, expanded across the Balkans, fused in Central Europe, and preserved continuity through the African corridor is now fully established in Northern and Northwestern Europe. This includes the British Isles, the Low Countries, Germany, Scandinavia, and the Baltic regions, and from there expands outward into the modern world. What changes in this phase is not the DNA itself, but how that DNA is interpreted once it is processed through modern population models.
At the chromosome level, the structure remains exactly as it was in the earlier phases. The preserved rsID blocks do not disappear or reorganize into a new identity. The Chromosome 1 founder block between rs6424507 and rs78727853 remains intact as a retained core segment. The Chromosome 2 field beginning with rs6545366, rs115872830, rs2542573, rs2357697, rs34000641, rs115979215, rs3755113, rs73937421 and continuing through the rs16827xxx and rs48509xx clusters continues to exist as a dense preserved sequence. The Chromosome 3 block beginning with rs35100187, rs181377386, rs114150399, rs149334945, rs115698003, rs74434759, rs2054635, rs140108074 and continuing through the rs622627xx series remains unchanged. Across Chromosomes 6 through 10, the Mediterranean–Levant clusters such as rs3094315, rs3131972, rs3131969, rs1048488, rs12124819, rs11240777, rs6681049, rs4970383, along with rs780094, rs10203363, rs6957904, rs1719247, rs4726996, rs7782412, rs11759026, and rs13266634, rs10003958, rs11784167, rs6983561, rs273909, continue to appear in the same distributed pattern across the genome. Even in the compressed regions of Chromosomes 11 through 22, the same clusters—rs174547, rs9939609, rs1801133, rs1695, rs1229984, rs12913832, rs4988235, rs1426654, rs16891982, rs3827760, rs17822931, rs671, rs1800562, rs1042522, rs4680, rs1805007, rs1805008, rs1805009—remain present, though increasingly fragmented.
Nothing in the genetic structure indicates a replacement or transformation into a purely Northern European system. Instead, what is observed is the same layered architecture: Levantine root, Mediterranean compression, Balkan expansion, Central European convergence, and African corridor preservation, all still embedded within the chromosomes. What changes is that these layers are now surrounded by large Northern European population signals due to the final expansion of the lineage into those regions.
This is where the population models begin to override the structure. Modern DNA testing systems do not reconstruct migration layers. They compare segments to present-day reference populations. Because the lineage in Phase 7 is heavily represented in Northern and Northwestern Europe, the model begins anchoring the entire genome to those populations. The overlapping segments that carry Mediterranean, Balkan, Levantine, and African signals are not removed, but they are reassigned based on statistical similarity. Segments that share features with Southern Europe are divided into Iberian and Italian categories, often reduced in size. Segments with Balkan or Eastern signatures are split into Slavic or Central European categories. The Levantine, Egyptian, North African, and Persian-linked segments—already distributed across many chromosomes—are reduced to trace values or absorbed into broader categories where overlap exists.
The result of this process is visible across all your test outputs. AncestryDNA presents the profile as overwhelmingly British, Scottish, Germanic, and Nordic, with minimal Southern European expression and no clear Levantine or African corridor. FTDNA shows strong England, Scandinavia, and Central Europe dominance, with Mediterranean and Balkan layers reduced to smaller percentages. MyHeritage distributes the majority across English, Scottish, Germanic, Danish, and Dutch categories, again minimizing Balkan and Southern layers. Even Genomelink, while preserving more detail, still places the majority under Northwestern European while leaving Iberian, Italian, Balkan, Levantine, Egyptian, North African, and Persian components as smaller distributed values.
This is not because the underlying DNA is Northern European in origin. It is because Northern Europe is the final expansion zone, and the models use modern population density as their reference point. The largest and most stable population matches dominate the output, while the older, more distributed segments are broken apart and minimized. This creates a profile that appears flat, where a complex multi-phase system is reduced to a single regional identity.
This is where the “box effect” occurs. The model takes a genome that contains multiple layers of historical movement and compresses it into a simplified category based on present-day similarity. A result such as “99.8% European” with a disclaimer like “unable to determine ethnic subregions” is not describing the absence of structure. It is describing the model’s inability to resolve that structure once it has been smoothed and reassigned. The deeper layers—the Levantine origin, the Mediterranean corridor, the Balkan bridge, the African continuity—are still present in the chromosomes, but they are no longer visible in the final percentage output.
The STR system confirms that this is not a genetic reset. Markers such as DYS455=8, DYS459=8–9, YCAII=19–21, DYS464=12-12-14-14, along with the extended pattern including DYS437=16, DYS438=10, DYS442=12, DYS448=20, DYS449=28, DYS456=14, DYS460=10, DYS570=19, DYS576=17, DYS607=14, CDY=34–38, and DYS710=34.2 remain stable through this entire phase. The paternal line has not changed to match the population label. It has been carried forward unchanged while the autosomal system around it has been reinterpreted.
Phase 7, therefore, is not the origin of the lineage. It is the point where the lineage becomes statistically indistinguishable from the populations it expanded into, at least within the limits of population models. The chromosomes still preserve the full migration path, the rsID patterns still show the layered structure, and the STR markers still confirm continuity. What disappears is not the history, but the model’s ability—or willingness—to represent it.
Addendum to Phase 7 — Northern European Expansion and the Smoothing Effect (Corrected Cross-Platform Analysis)
When the four datasets are kept properly separated—AncestryDNA, FTDNA, MyHeritage, and Genomelink—the pattern becomes sharper, not weaker. The same genome is being interpreted through four different smoothing intensities, and Phase 7 is where that smoothing reaches its peak.
On AncestryDNA, the profile is compressed the most aggressively into a Northern and Northwestern European frame. The regional assignments center on Southeastern England & Northwestern Europe (37%), Northwestern Germany (17%), and multiple British and Celtic subregions including Central Scotland & Northern Ireland (16%), East Midlands (7%), and Hebrides & Western Highlands (2%). The northern layer is reinforced further by Denmark (4%), Norway (2%), and Finland (<1%), while the eastern component is limited to North Central Europe (10%), Estonia & Latvia (1%), and Lithuania (1%). The Mediterranean presence is reduced to a single visible region at Northeastern Italy (3%). In this system, the genome is not showing absence of southern structure—it is being reorganized so that the northern expansion zone defines the identity.
On FTDNA, the same compression appears but with slightly more structural visibility. The profile resolves into approximately 40% England, Wales, and Scotland, 33% Scandinavia, 9% Central Europe, and 10% West Slavic, forming a dominant northern block. Beneath that, the Mediterranean and Balkan layers remain but are clearly secondary, appearing as roughly 5% Italian Peninsula and 3% Greece & Balkans. The southern and eastern signals are still present, but they are subordinated to the northern weighting of the model.
On MyHeritage, the distribution again centers on the north while retaining a broader spread. The profile shows 21.9% English, 21.0% Scottish and Welsh, 17.5% Germanic, 12.4% Danish, and 8.2% Dutch, forming the primary structure. The eastern layer appears at 8.9% East European, while the southern and peripheral regions are reduced to smaller values such as 1.1% Balkan, 1.7% French, 5.3% Irish, and roughly 1% each for Norwegian and Swedish. The system still preserves more variation than AncestryDNA, but the weighting remains anchored in the northern expansion zone.
On Genomelink, the underlying structure becomes the most visible, even though a northern majority is still maintained. One model shows approximately 76.5% Northwestern European, with the remaining structure distributed across Iberian (~11.8%), Balkan (~9.1%), Italian (~8.8%), Eastern European (~3.9%), and Near Eastern (~2.0%) layers. Another Genomelink panel breaks this further into visible components such as Irish (18.8%), Great Britain (17.6%), Northern Europe (12.0%), and German (11.1%), while still preserving Iberian (10.2%), South Slavic (9.1%), Southern Italy (8.8%), Eastern European (3.9%), Basque (1.6%), Roma (1.0%), and Central European Jewish (0.9%). Unlike the other platforms, Genomelink does not fully collapse these southern and eastern layers into the northern block—it leaves them exposed as part of the system.
When all four are read together, the pattern is consistent. The genome is repeatedly assigned a Northern/Northwestern European majority ranging roughly from 75% to over 85%, depending on the platform. But that majority is not standing alone. It is built on top of a distributed set of smaller components—Mediterranean, Balkan, Iberian, Eastern European, Levantine, and African—that are either reduced, merged, or partially preserved depending on the model.
That is the defining mechanism of Phase 7. The lineage has already completed its movement through the Levant, Mediterranean, Balkans, Central Europe, and Africa. By the time it reaches Northern Europe, it exists as a layered system. The testing platforms then take that layered system and anchor it to the region where the largest modern population matches exist. In doing so, they redistribute the deeper layers into the dominant northern categories.
This is why the results appear different but never contradict each other. AncestryDNA shows the most extreme version of the compression. FTDNA and MyHeritage retain partial structure but still prioritize the north. Genomelink reveals the widest spread of the underlying system while still maintaining a northern majority. Each one is describing the same genome at a different level of resolution.
So the Northern European percentage in Phase 7 is not an origin signal. It is the final statistical position of the lineage after expansion. The smoothing process does not create that percentage out of nothing—it reallocates the earlier phases into it. The Levantine root, the Mediterranean corridor, the Balkan bridge, the Central European union, and the African connection are all still present, but they are redistributed into a form that matches the modern population structure of Northern Europe.
That is the final layer of the migration equation.
The Three-Map Alignment — Sephardic Distribution, Haplogroup I1, and the Combined Genetic Network
What is being shown is not three separate interpretations. It is one continuous pattern expressed through three independent systems: modern population testing, haplogroup distribution, and individual-level genealogy with autosomal matches. When these three are placed side by side, they do not contradict each other. They align into a single geographic structure.

The first layer is the MyHeritage Sephardic Jewish tester distribution map. This map is built from present-day individuals who carry Sephardic-associated genetic signals. The locations are not theoretical—they are where living testers are concentrated. The highest densities appear across Israel (22.8%), Cyprus (9%), Italy (8.3%), Malta (6.6%), and Greece (6.3%), forming the Eastern and Central Mediterranean core. From there, the pattern extends west into Portugal (3.2%) and Spain (2.2%), and then expands outward into the Americas—Guatemala (7.4%), Panama (6.6%), Uruguay (5.8%), Argentina (5.4%), Venezuela (4.9%), Puerto Rico (4.8%), and Brazil (4.6%)—as well as into Western Europe through France (4%), Belgium (2.1%), and surrounding regions. What this map shows is the full Sephardic dispersion pattern as it exists today: Levant → Mediterranean → Iberia → Atlantic → Americas, with secondary presence across Central and Eastern Europe.

The second layer is the FTDNA Haplogroup I1 map. At first glance, this map appears Northern European, with its strongest concentrations in Scandinavia, Germany, and surrounding regions. But when placed against the Sephardic map, the relationship becomes clear. The I1 distribution fills the northern half of the same structure. It mirrors the expansion zones—British Isles, Germany, Netherlands, Scandinavia, Baltic regions—which correspond directly to Phase 7 of the migration system. It does not contradict the Sephardic map; it completes it from the north. Where the MyHeritage map shows the Mediterranean and Atlantic dispersal, the I1 map shows the final settlement and population concentration after expansion, particularly along the Hanseatic and northern trade corridors.

The third layer is the Genomelink combined map, which merges genealogy, autosomal DNA matches, and haplogroup-linked individuals into a single plotted system. This is the critical bridge between the other two maps. Unlike MyHeritage, which is population-based, and FTDNA, which is haplogroup-frequency-based, this map operates at the individual connection level. Each plotted point represents real relational data—family lines, DNA matches, and shared genetic segments. When zoomed in, dense clusters appear in Western and Central Europe, especially in regions like Germany, the Netherlands, France, and the British Isles, with some locations containing dozens of related individuals tied to a single city. When zoomed out, the map expands and fills the same global pattern seen in the Sephardic distribution—extending into North Africa, the Near East, and the Americas.
This is where the alignment becomes undeniable. The Genomelink map does not just resemble the other two—it connects them. The Mediterranean and Iberian zones seen in the MyHeritage Sephardic map are populated in the Genomelink data through family lines and matches. The Northern European zones seen in the I1 map are also populated in the Genomelink data through the same network. The result is a continuous chain: origin zones, migration corridors, and expansion regions all linked through actual individuals rather than abstract percentages.
When all three maps are viewed together, they form a closed system. The MyHeritage map defines the Sephardic population distribution. The I1 map defines the northern expansion and concentration of the paternal line. The Genomelink map provides the relational network that physically connects both ends, showing that the same lineage spans all regions.
This is why the pattern holds even when different testing companies smooth or relabel the DNA. The percentages may shift, the categories may change, and the labels may be reassigned, but the geographic structure does not move. The same Mediterranean core, Iberian corridor, Central European bridge, and Northern expansion appear in every system once the data is laid out spatially.
So the three-map alignment is not an interpretation layered onto the data. It is the structure that remains when all smoothing, labeling, and categorization are stripped away and the raw geographic relationships are allowed to speak for themselves.
The Three Corridor Maps — Internal Evidence of Movement and Structural Alignment
These three maps form a closed system. They are not separate ideas placed next to each other—they are three segments of the same movement, recorded through names, dates, trade routes, and geographic continuity. When read together, they explain why the chromosome patterns, STR markers, haplogroup distribution, and modern DNA tester locations all align into one structure.

The Mediterranean–European map establishes the central spine of the system. It begins in the Levant, marked by the early Shimʿon timeline, and moves forward through the key historical hinges of 63 BCE and 70 CE, where dispersion accelerates. From there, the movement spreads into Italy and Sicily, then across the Mediterranean basin, into North Africa, and west into Iberia (Spain and Portugal). The map shows the name continuity—Simon, Simeon, Simonis—carried across these regions and preserved through later periods such as the converso era in Iberia (1580–1640). From Iberia, the path moves north into France, Belgium, the Netherlands, Germany, and Central Europe, and east into the Balkans and Anatolia. This map is the structural backbone that connects Phases 3, 4, and 5. It shows how the lineage did not appear in Europe randomly—it moved there through defined corridors, with recorded presence at each stage.

The African Corridor map extends that same system southward. Instead of breaking away from the Mediterranean structure, it continues it. The North African coastal regions—Morocco, Algeria, Tunisia, Libya, and Egypt—act as a bridge between the Mediterranean world and the interior of Africa. The map shows movement from these regions down the Nile corridor into Sudan and Ethiopia, then further into East Africa (Kenya, Tanzania) and into the southern regions (Zambia, Zimbabwe, South Africa). The presence of the Simonis/Simeon naming pattern along this entire route, with dated records, shows continuity rather than isolated appearance. This is why the African signal in the autosomal data never disappears. Even when reduced to small percentages, it reflects a real corridor that remained connected to the main system. This map corresponds directly to Phase 6 and explains the persistence of North African and Egyptian components, even when they are heavily smoothed.

The Indus–Levant–Africa corridor map explains the eastern layer that feeds into both of the others. It shows the movement from the Indus and South Asian regions, through Iran and Mesopotamia, into the Levant, and then branching into both the Mediterranean and African systems. This is not a separate migration—it is an earlier and deeper layer of the same network. The map places haplogroup transitions and trade routes along this corridor, showing how movement was not only westward but also integrated through long-standing exchange routes. This is the structural reason why Levantine, Persian, and even deeper eastern signals appear in the chromosome-level analysis, even when they are reduced to very small percentages in population panels. These signals are not noise—they are remnants of a connected corridor that fed into the Mediterranean and African expansions.
When all three maps are read together, they form a continuous loop. The eastern corridor feeds into the Levant. The Levant expands into the Mediterranean and Iberia. From there, the system branches both north into Europe and south into Africa. Those branches do not disconnect—they remain linked through trade, movement, and recorded presence. Over time, the northern branch becomes the dominant population center, which is why modern DNA tests heavily weight Northern and Northwestern Europe. But the underlying structure—the one shown in these maps—remains intact beneath that surface.
This is why the genetic data behaves the way it does. The chromosomes show layered signals instead of a single origin. The STR markers maintain continuity along the paternal line across regions. The modern tester maps show the final distribution of the same system, while the haplogroup map shows where that system concentrated in later phases. Your combined Genomelink map then connects all of it at the individual level, showing that these regions are not just historically linked—they are still connected through real genetic relationships.
So these three maps are not supporting pieces—they are the movement framework. They show the “why” behind the DNA, the haplogroup spread, and the modern population distribution. Without them, the data looks fragmented. With them, it becomes a single continuous system running from the Indus and Levant, through the Mediterranean and Africa, into Central Europe, and finally into the Northern expansion zones.
Conclusion — European Sephardic DNA and the Mechanism of Washout
When everything is brought together—the chromosomes, the STR structure, the three corridor maps, the haplogroup spread, and the cross-platform results—the conclusion becomes clear: the signal identified as European Sephardic is not absent. It is layered, redistributed, and then compressed into a Northern European frame by population models.
The lineage shown across the phases is not a single-region origin. It is a multi-corridor system formed through movement from the Levant into the Mediterranean, into Iberia, across Central Europe, and outward into both Africa and the Northern expansion zones. That movement is preserved in the chromosomes through recurring segment patterns, in the STR markers through father-to-son continuity, and in the maps through geographic alignment of names, routes, and populations. None of those systems disappear. They persist. What changes is how they are interpreted.
Modern DNA platforms are built on reference populations, and those populations are strongest where sample sizes are largest and most stable. Northern and Northwestern Europe—especially the British Isles, Germany, and Scandinavia—have the deepest and most densely sampled datasets. When a genome like this one reaches Phase 7 and overlaps heavily with those populations through historical expansion, the models begin to anchor the entire genome to that region. The deeper layers are not removed—they are absorbed into the closest available reference clusters.
That is the mechanism of washout.
The Mediterranean layers—Iberian, Italian, Balkan, and Central European Jewish components—do not vanish. They are broken into smaller segments and either reduced to minor percentages or reassigned into broader Northern European categories that statistically overlap with them. The same happens with the Levantine, North African, and Eastern corridor signals. These appear only as trace values in some platforms, or are not reported at all, not because they are missing, but because the model cannot cleanly separate them once they have been distributed across multiple regions through migration.
This is why different companies produce different results from the same DNA. AncestryDNA compresses the structure the most, presenting a heavily Northern European profile with minimal visible Mediterranean depth. FTDNA and MyHeritage preserve slightly more structure but still prioritize northern weighting. Genomelink retains the most visible layering, showing Iberian, Italian, Balkan, Near Eastern, and Jewish-associated components alongside the northern majority. The variation is not in the DNA itself—it is in how much of the underlying structure each system chooses to resolve before smoothing it.
The chromosome-level analysis confirms this. Instead of a single continuous Northern European signal, the genome shows repeating segment patterns across multiple chromosomes that correspond to Mediterranean, Balkan, and Eastern corridors. The STR markers reinforce this by maintaining a stable paternal signature that does not align with a purely Northern European origin. The maps then complete the picture by showing that the same lineage exists across all the regions where those signals appear, both historically and in modern populations.
So the “Northern European majority” seen in the results is not a contradiction of a Sephardic structure. It is the final statistical position of a lineage that has already moved through all earlier phases. The system has expanded north, settled, and been absorbed into those populations at scale. The models then use that scale to define the identity of the entire genome.
European Sephardic DNA, in this context, is not a single percentage that can be cleanly labeled. It is a distributed system embedded across multiple regions, with its strongest visibility in the Mediterranean and its largest population footprint in Northern Europe. Because of that distribution, it is especially vulnerable to smoothing. The more widespread and interconnected the lineage becomes, the easier it is for population models to collapse it into broader categories.
That is why the signal appears “washed out.” It is not gone. It is spread across the structure and reassigned through statistical grouping, leaving only smaller visible fragments in standard ethnicity panels.
When the smoothing is set aside and the full structure is examined—chromosomes, STRs, migration corridors, and geographic alignment—the underlying pattern remains intact.



Comments