[fix](ann-index) Fix ANN IVF/PQ recall, avoid init-time large ANN build-buffer reservation, and skip ANN index build for segments with insufficient rows. by kaka11chen · Pull Request #64082 · apache/doris

kaka11chen · 2026-06-03T12:01:23Z

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary:

This PR fixes several ANN index build issues:

ANN index writer previously pre-reserved ann_index_build_chunk_size * dim floats during init, which could allocate excessive memory immediately for high-dimensional vectors.
For train-required indexes such as IVF, IVF_ON_DISK, and PQ-quantized indexes, chunk-level training could train FAISS with only part of the segment data, causing poor or even zero recall.
IVF_ON_DISK did not use nlist as its minimum FAISS training row requirement.
Segments with fewer rows than the minimum training requirement could fail during build instead of skipping ANN index persistence.

This PR changes the build behavior as follows:

Remove init-time large build-buffer reservation.
Buffer segment vectors and train train-required ANN indexes once with the segment data.
Skip persisting ANN indexes for empty or too-small segments.
Add ann_index_build_min_segment_rows so small ANN indexes can be skipped by a Doris-side row threshold.
Treat IVF_ON_DISK minimum training rows consistently with IVF.

Release note

Fix ANN IVF/PQ recall, avoid init-time large ANN build-buffer reservation, and skip ANN index build for segments with insufficient training rows.

Check List (For Author)

Test
- Regression test
  - ./run-regression-test.sh --run -d ann_index_p0 -s ivf_pq_full_buffer_train_recall
- Unit Test
- Manual test
  - run buildall
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes. ANN indexes that require training now train once with full segment data. Segments with insufficient training rows skip ANN index build instead of failing.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

hello-stephen · 2026-06-03T12:01:30Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

kaka11chen · 2026-06-03T12:01:54Z

run buildall

kaka11chen · 2026-06-03T12:11:37Z

run buildall

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

kaka11chen · 2026-06-03T13:27:45Z

run buildall

hello-stephen · 2026-06-03T17:05:18Z

TPC-H: Total hot run time: 29269 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 582071f0eeb96aa5a7754df2d0ef4ec12745e462, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17827	4092	4069	4069
q2	q3	10781	1462	831	831
q4	4692	477	345	345
q5	7544	876	599	599
q6	181	170	137	137
q7	773	873	627	627
q8	9412	1593	1572	1572
q9	5712	4555	4464	4464
q10	6760	1803	1534	1534
q11	440	277	253	253
q12	634	440	289	289
q13	18142	3384	2813	2813
q14	267	264	240	240
q15	q16	833	782	713	713
q17	1020	947	950	947
q18	6921	5860	5590	5590
q19	1362	1186	1034	1034
q20	518	413	253	253
q21	6177	2870	2637	2637
q22	472	376	322	322
Total cold run time: 100468 ms
Total hot run time: 29269 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5113	4833	4845	4833
q2	q3	4887	5277	4748	4748
q4	2119	2219	1416	1416
q5	4790	4861	4656	4656
q6	228	176	127	127
q7	1850	1827	1561	1561
q8	2405	2136	2159	2136
q9	7877	7608	7402	7402
q10	4720	4691	4229	4229
q11	534	386	352	352
q12	727	756	522	522
q13	3013	3363	2818	2818
q14	281	278	263	263
q15	q16	675	713	612	612
q17	1287	1259	1259	1259
q18	7369	6856	6835	6835
q19	1129	1122	1135	1122
q20	2225	2206	1942	1942
q21	5278	4622	4494	4494
q22	516	476	419	419
Total cold run time: 57023 ms
Total hot run time: 51746 ms

hello-stephen · 2026-06-03T17:16:12Z

TPC-DS: Total hot run time: 169468 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 582071f0eeb96aa5a7754df2d0ef4ec12745e462, data reload: false

query5	4322	636	479	479
query6	443	203	184	184
query7	4814	565	303	303
query8	378	220	208	208
query9	8783	4008	4018	4008
query10	475	331	259	259
query11	5927	2321	2229	2229
query12	158	103	100	100
query13	1264	625	455	455
query14	6423	5360	4995	4995
query14_1	4369	4344	4350	4344
query15	211	201	177	177
query16	1017	449	434	434
query17	921	679	576	576
query18	2452	468	342	342
query19	198	176	138	138
query20	110	111	104	104
query21	218	137	115	115
query22	13736	13525	13321	13321
query23	17626	16887	16554	16554
query23_1	16584	16352	16254	16254
query24	7673	1716	1296	1296
query24_1	1323	1310	1338	1310
query25	572	469	418	418
query26	1294	332	172	172
query27	2675	530	351	351
query28	4453	2033	2003	2003
query29	1095	641	508	508
query30	305	239	201	201
query31	1130	1092	957	957
query32	121	68	62	62
query33	546	322	274	274
query34	1188	1126	646	646
query35	764	789	686	686
query36	1379	1457	1231	1231
query37	157	107	93	93
query38	3192	3133	3056	3056
query39	920	921	890	890
query39_1	889	903	872	872
query40	223	158	101	101
query41	65	62	62	62
query42	98	96	93	93
query43	320	321	282	282
query44	
query45	196	187	178	178
query46	1094	1250	770	770
query47	2344	2422	2254	2254
query48	411	401	306	306
query49	628	468	359	359
query50	994	351	265	265
query51	4359	4337	4220	4220
query52	87	90	77	77
query53	237	271	199	199
query54	262	213	195	195
query55	80	77	69	69
query56	219	239	246	239
query57	1460	1395	1338	1338
query58	247	215	212	212
query59	1580	1620	1415	1415
query60	281	283	226	226
query61	160	160	155	155
query62	705	671	591	591
query63	224	181	186	181
query64	2559	767	628	628
query65	
query66	1821	469	348	348
query67	29740	29774	29591	29591
query68	
query69	430	295	263	263
query70	988	929	962	929
query71	305	225	251	225
query72	3008	2772	2419	2419
query73	843	740	465	465
query74	5129	4932	4768	4768
query75	2662	2589	2264	2264
query76	2329	1141	765	765
query77	365	371	294	294
query78	12462	12331	12014	12014
query79	1234	1056	767	767
query80	514	480	385	385
query81	445	282	241	241
query82	233	159	123	123
query83	275	278	248	248
query84	294	139	111	111
query85	833	531	434	434
query86	346	298	304	298
query87	3352	3345	3236	3236
query88	3608	2729	2721	2721
query89	414	384	328	328
query90	2190	183	179	179
query91	173	167	137	137
query92	63	61	54	54
query93	1481	1478	849	849
query94	549	356	322	322
query95	664	373	344	344
query96	1098	825	340	340
query97	2721	2699	2575	2575
query98	213	206	203	203
query99	1159	1175	1053	1053
Total cold run time: 251293 ms
Total hot run time: 169468 ms

hello-stephen · 2026-06-03T18:01:54Z

BE UT Coverage Report

Increment line coverage 96.20% (76/79) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.91% (21032/39013)
Line Coverage	37.57% (199809/531817)
Region Coverage	33.67% (156804/465736)
Branch Coverage	34.63% (68582/198021)

hello-stephen · 2026-06-03T18:02:19Z

BE UT Coverage Report

Increment line coverage 96.20% (76/79) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.92% (21036/39013)
Line Coverage	37.61% (199993/531817)
Region Coverage	33.69% (156893/465736)
Branch Coverage	34.65% (68611/198021)

hello-stephen · 2026-06-03T18:07:44Z

BE Regression && UT Coverage Report

Increment line coverage 96.25% (77/80) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.88% (27478/38229)
Line Coverage	55.48% (294351/530545)
Region Coverage	52.30% (245926/470246)
Branch Coverage	53.41% (106186/198814)

- Test: Regression test - ./run-regression-test.sh --run -d ann_index_p0 -s ivf_pq_full_buffer_train_recall - Behavior changed: No - Does this need documentation: No

kaka11chen · 2026-06-04T02:09:59Z

run buildall

hello-stephen · 2026-06-04T03:23:40Z

BE UT Coverage Report

Increment line coverage 96.20% (76/79) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.92% (21045/39030)
Line Coverage	37.57% (199993/532279)
Region Coverage	33.66% (156901/466109)
Branch Coverage	34.62% (68635/198252)

### What problem does this PR solve? Issue Number: None Related PR: apache#64082 Problem Summary: Clarify why ANN index writer swaps the buffered vectors with an empty PODArray instead of using clear(). The swap intentionally releases the full-segment training buffer before saving the index, while clear() would keep the allocated capacity. ### Release note None ### Check List (For Author) - Test: No need to test (comment-only change) - Behavior changed: No - Does this need documentation: No

hello-stephen · 2026-06-04T05:20:35Z

BE Regression && UT Coverage Report

Increment line coverage 96.25% (77/80) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.93% (27512/38246)
Line Coverage	55.52% (294804/531007)
Region Coverage	52.27% (246069/470731)
Branch Coverage	53.42% (106351/199101)

### What problem does this PR solve? Issue Number: None Related PR: apache#64082 Problem Summary: Remove the redundant ANN writer `_skip_build` state. The flag was only set from `close_on_error()`, while normal index skip behavior is already driven by zero rows or by the segment row count being smaller than the index training requirement. Keeping the writer state explicit avoids carrying an abort flag into regular add and finish paths. ### Release note None ### Check List (For Author) - Test: Unit Test - `ENABLE_PCH=OFF ./run-be-ut.sh --run --filter=AnnIndexWriterTest.*` - Behavior changed: No - Does this need documentation: No

hello-stephen · 2026-06-04T08:19:44Z

TPC-H: Total hot run time: 29312 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 04d8a048174595050f3fb6792f07bf1a7aceee6b, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17632	4048	4021	4021
q2	q3	10781	1436	837	837
q4	4689	493	366	366
q5	7556	903	581	581
q6	185	177	140	140
q7	770	874	634	634
q8	9371	1659	1548	1548
q9	5898	4562	4546	4546
q10	6776	1815	1556	1556
q11	432	274	254	254
q12	620	437	294	294
q13	18193	3433	2764	2764
q14	276	265	242	242
q15	q16	834	775	717	717
q17	973	896	876	876
q18	6876	5875	5679	5679
q19	1319	1246	1091	1091
q20	520	403	263	263
q21	6325	2963	2591	2591
q22	476	379	312	312
Total cold run time: 100502 ms
Total hot run time: 29312 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5108	4828	4776	4776
q2	q3	4855	5355	4822	4822
q4	2113	2241	1419	1419
q5	4809	4819	4694	4694
q6	237	177	132	132
q7	1921	1817	1594	1594
q8	2418	2128	2096	2096
q9	7914	7449	7379	7379
q10	4777	4721	4263	4263
q11	534	395	359	359
q12	738	748	531	531
q13	3059	3380	2769	2769
q14	273	277	254	254
q15	q16	686	710	625	625
q17	1299	1267	1263	1263
q18	7380	6959	6861	6861
q19	1125	1133	1126	1126
q20	2240	2223	1950	1950
q21	5363	4627	4519	4519
q22	524	453	422	422
Total cold run time: 57373 ms
Total hot run time: 51854 ms

hello-stephen · 2026-06-04T08:30:46Z

TPC-DS: Total hot run time: 169349 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 04d8a048174595050f3fb6792f07bf1a7aceee6b, data reload: false

query5	4317	622	485	485
query6	450	201	183	183
query7	4865	567	308	308
query8	375	222	210	210
query9	8782	4092	4111	4092
query10	457	327	273	273
query11	5843	2368	2195	2195
query12	163	106	102	102
query13	1280	636	467	467
query14	6416	5407	5077	5077
query14_1	4438	4437	4365	4365
query15	205	194	177	177
query16	976	454	435	435
query17	912	699	563	563
query18	2494	488	343	343
query19	203	186	143	143
query20	108	111	104	104
query21	218	143	124	124
query22	13600	13557	13500	13500
query23	17287	16515	16245	16245
query23_1	16364	16332	16395	16332
query24	7577	1753	1324	1324
query24_1	1324	1345	1319	1319
query25	557	461	388	388
query26	1287	335	178	178
query27	2671	555	329	329
query28	4495	2049	2063	2049
query29	1066	619	480	480
query30	323	241	197	197
query31	1119	1084	946	946
query32	117	64	56	56
query33	535	319	258	258
query34	1209	1177	656	656
query35	753	799	674	674
query36	1379	1393	1249	1249
query37	160	104	92	92
query38	3216	3165	3057	3057
query39	943	925	892	892
query39_1	888	881	862	862
query40	223	125	103	103
query41	69	60	62	60
query42	95	93	93	93
query43	332	336	286	286
query44	
query45	193	190	178	178
query46	1093	1217	759	759
query47	2335	2393	2217	2217
query48	408	430	282	282
query49	634	463	370	370
query50	1025	353	263	263
query51	4350	4310	4318	4310
query52	90	90	80	80
query53	250	272	197	197
query54	272	218	199	199
query55	80	74	71	71
query56	238	227	219	219
query57	1447	1405	1316	1316
query58	258	223	216	216
query59	1628	1718	1454	1454
query60	285	254	232	232
query61	163	176	187	176
query62	703	690	581	581
query63	244	222	191	191
query64	2553	793	633	633
query65	
query66	1804	473	339	339
query67	29818	29748	29100	29100
query68	
query69	422	313	270	270
query70	984	973	965	965
query71	304	222	213	213
query72	2969	2718	2432	2432
query73	871	762	448	448
query74	5137	4991	4793	4793
query75	2716	2602	2278	2278
query76	2339	1173	796	796
query77	370	401	319	319
query78	12501	12381	11940	11940
query79	1274	1043	784	784
query80	547	504	425	425
query81	455	290	249	249
query82	242	164	126	126
query83	284	299	265	265
query84	298	158	124	124
query85	952	624	527	527
query86	336	323	295	295
query87	3382	3341	3219	3219
query88	3621	2813	2739	2739
query89	415	378	332	332
query90	2220	186	193	186
query91	176	164	136	136
query92	67	65	54	54
query93	1555	1464	845	845
query94	533	340	330	330
query95	712	490	352	352
query96	1069	837	349	349
query97	2707	2729	2569	2569
query98	209	210	204	204
query99	1174	1164	1037	1037
Total cold run time: 251246 ms
Total hot run time: 169349 ms

…added no-train indexes during segment writing. This made the build strategy harder to reason about and could still spend CPU/memory building small HNSW/FLAT segments that should be skipped by a Doris-side row threshold. This change removes the chunk add configs, buffers ANN vectors for the whole segment, applies effective_min_rows = max(vector_index->get_min_train_rows(), config::ann_index_build_min_segment_rows) in finish(), and then trains when needed, adds once, releases the build buffer, and saves the index. Empty segments or segments below the effective threshold delete only the current index entry instead of persisting an ANN index. Add BE config ann_index_build_min_segment_rows to skip persisting ANN indexes for small segments. Remove ann_index_build_add_chunk_size and ann_index_build_add_chunk_bytes.

kaka11chen · 2026-06-04T10:59:00Z

run buildall

hello-stephen · 2026-06-04T14:17:26Z

TPC-H: Total hot run time: 29312 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d32d73cd11c7503bcc32ac3d0e28e8f09b2b87c1, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17897	3982	4101	3982
q2	q3	10839	1458	833	833
q4	4686	475	339	339
q5	7588	911	596	596
q6	184	175	135	135
q7	767	872	634	634
q8	9343	1661	1589	1589
q9	5829	4511	4486	4486
q10	6814	1811	1537	1537
q11	443	270	256	256
q12	644	421	290	290
q13	18172	3438	2780	2780
q14	266	254	239	239
q15	q16	814	768	709	709
q17	929	910	1017	910
q18	7065	5750	5539	5539
q19	1354	1271	1106	1106
q20	524	403	266	266
q21	6343	2877	2769	2769
q22	470	379	317	317
Total cold run time: 100971 ms
Total hot run time: 29312 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5022	4624	4766	4624
q2	q3	4872	5324	4706	4706
q4	2090	2148	1383	1383
q5	4743	4923	4669	4669
q6	228	175	129	129
q7	1851	1764	1591	1591
q8	2369	2071	2050	2050
q9	7974	7557	7442	7442
q10	4741	4679	4228	4228
q11	526	380	350	350
q12	717	739	523	523
q13	3006	3410	2790	2790
q14	283	273	244	244
q15	q16	671	687	606	606
q17	1280	1258	1248	1248
q18	7356	6784	6751	6751
q19	1129	1103	1094	1094
q20	2217	2223	1930	1930
q21	5244	4561	4423	4423
q22	542	453	405	405
Total cold run time: 56861 ms
Total hot run time: 51186 ms

hello-stephen · 2026-06-04T14:28:17Z

TPC-DS: Total hot run time: 169025 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d32d73cd11c7503bcc32ac3d0e28e8f09b2b87c1, data reload: false

query5	4337	627	483	483
query6	455	192	177	177
query7	4805	506	304	304
query8	366	215	213	213
query9	8811	4061	4064	4061
query10	443	310	263	263
query11	5965	2338	2212	2212
query12	163	106	101	101
query13	1248	582	417	417
query14	6369	5398	5051	5051
query14_1	4413	4394	4416	4394
query15	205	193	175	175
query16	970	438	416	416
query17	915	681	568	568
query18	2434	462	345	345
query19	200	179	136	136
query20	107	108	105	105
query21	218	135	118	118
query22	13624	13574	13339	13339
query23	17455	16536	16229	16229
query23_1	16197	16366	16296	16296
query24	7677	1741	1280	1280
query24_1	1299	1325	1329	1325
query25	547	453	370	370
query26	1321	298	165	165
query27	2694	536	337	337
query28	4491	2047	2002	2002
query29	1095	612	519	519
query30	310	240	202	202
query31	1110	1076	954	954
query32	106	61	59	59
query33	520	318	246	246
query34	1183	1108	631	631
query35	745	780	690	690
query36	1395	1400	1266	1266
query37	151	101	91	91
query38	3165	3135	3051	3051
query39	937	942	896	896
query39_1	880	877	871	871
query40	225	119	100	100
query41	64	66	63	63
query42	95	96	94	94
query43	317	321	272	272
query44	
query45	198	190	184	184
query46	1078	1233	754	754
query47	2346	2377	2315	2315
query48	405	413	293	293
query49	633	472	362	362
query50	942	351	244	244
query51	4331	4284	4229	4229
query52	87	86	75	75
query53	243	271	192	192
query54	261	218	195	195
query55	78	77	70	70
query56	234	230	215	215
query57	1419	1400	1322	1322
query58	241	213	198	198
query59	1575	1647	1425	1425
query60	292	245	227	227
query61	157	157	157	157
query62	698	652	601	601
query63	230	185	188	185
query64	2582	822	643	643
query65	
query66	1785	458	346	346
query67	29069	29812	29460	29460
query68	
query69	424	304	273	273
query70	955	922	932	922
query71	306	235	207	207
query72	2950	2742	2408	2408
query73	855	765	418	418
query74	5165	4980	4775	4775
query75	2647	2545	2255	2255
query76	2341	1134	781	781
query77	360	366	298	298
query78	12464	12409	11907	11907
query79	1270	1063	764	764
query80	575	503	428	428
query81	450	285	246	246
query82	235	156	122	122
query83	273	284	260	260
query84	300	149	120	120
query85	921	601	515	515
query86	328	297	289	289
query87	3388	3352	3204	3204
query88	3672	2799	2775	2775
query89	422	379	333	333
query90	2161	180	189	180
query91	197	187	152	152
query92	64	66	61	61
query93	1428	1490	892	892
query94	551	383	318	318
query95	669	466	344	344
query96	1026	779	338	338
query97	2704	2726	2558	2558
query98	223	208	203	203
query99	1170	1167	1028	1028
Total cold run time: 249831 ms
Total hot run time: 169025 ms

yiguolei · 2026-06-05T03:50:46Z


    _dir = compound_dir.value();

+    _min_segment_rows = AnnIndexColumnWriter::min_segment_rows();


这行代码是在干啥？

这行代码是在干啥？

Minimum segment rows required to persist an ANN index.

yiguolei · 2026-06-05T03:55:59Z

+    return Status::OK();
+}
+
+Status AnnIndexColumnWriter::_build_and_save(Int64 min_train_rows, Int64 effective_min_rows) {


这个函数，为什么要有min_train_rows 这个参数？

### What problem does this PR solve? Issue Number: None Related PR: apache#64082 Problem Summary: The ANN writer buffers vectors through an internal helper after validating array dimensions in add_array_values(). Add a short comment to make the validation precondition explicit for the buffer helper path. ### Release note None ### Check List (For Author) - Test: Manual test - Ran git diff --check - Behavior changed: No - Does this need documentation: No

### What problem does this PR solve? Issue Number: None Related PR: apache#64082 Problem Summary: The ANN writer used a tiny helper only to compute max(min_train_rows, ann_index_build_min_segment_rows). Inline the single-use calculation in finish() to keep the build threshold logic local and reduce unnecessary indirection. ### Release note None ### Check List (For Author) - Test: Manual test - Ran git diff --check - Ran rg to verify _effective_min_rows has no remaining references - Behavior changed: No - Does this need documentation: No

### What problem does this PR solve? Issue Number: None Related PR: apache#64082 Problem Summary: The ANN writer had small single-use helpers and a cached min segment rows member after switching to finish-time buffering. Inline vector buffering, buffer release, and direct ann_index_build_min_segment_rows access at their call sites to keep the writer implementation simpler. ### Release note None ### Check List (For Author) - Test: Manual test - Ran git diff --check - Ran rg to verify _append_vectors_to_buffer, _release_buffered_vectors, _min_segment_rows, and min_segment_rows() have no remaining references - Behavior changed: No - Does this need documentation: No

kaka11chen · 2026-06-05T06:59:08Z

run buildall

github-actions · 2026-06-05T07:15:53Z

PR approved by at least one committer and no changes requested.

github-actions · 2026-06-05T07:15:56Z

PR approved by anyone and no changes requested.

hello-stephen · 2026-06-05T08:22:26Z

BE UT Coverage Report

Increment line coverage 92.50% (37/40) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.77% (21056/39158)
Line Coverage	37.46% (200140/534243)
Region Coverage	33.49% (156927/468642)
Branch Coverage	34.52% (68655/198877)

hello-stephen · 2026-06-05T13:15:52Z

TPC-H: Total hot run time: 29625 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 388ae6f272fe02af5a674b51da76f70fb986364a, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17834	4043	4090	4043
q2	q3	10736	1426	854	854
q4	4685	478	349	349
q5	7563	891	593	593
q6	186	175	136	136
q7	777	858	670	670
q8	9385	1665	1727	1665
q9	5755	4562	4562	4562
q10	6748	1825	1507	1507
q11	437	274	247	247
q12	638	436	297	297
q13	18111	3501	2798	2798
q14	267	258	243	243
q15	q16	832	778	714	714
q17	937	970	998	970
q18	6943	5807	5633	5633
q19	1327	1361	1144	1144
q20	511	416	265	265
q21	6471	2839	2624	2624
q22	455	387	311	311
Total cold run time: 100598 ms
Total hot run time: 29625 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5089	4719	4827	4719
q2	q3	4930	5294	4728	4728
q4	2135	2174	1416	1416
q5	4799	4971	4652	4652
q6	227	178	128	128
q7	1853	1758	1593	1593
q8	2480	2118	2106	2106
q9	7899	7534	7480	7480
q10	4769	4673	4241	4241
q11	544	390	358	358
q12	735	748	531	531
q13	3000	3310	2837	2837
q14	287	292	250	250
q15	q16	679	699	619	619
q17	1308	1278	1282	1278
q18	7334	6904	6835	6835
q19	1138	1141	1121	1121
q20	2232	2243	1964	1964
q21	5339	4578	4515	4515
q22	556	476	431	431
Total cold run time: 57333 ms
Total hot run time: 51802 ms

hello-stephen · 2026-06-05T13:26:46Z

TPC-DS: Total hot run time: 170617 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 388ae6f272fe02af5a674b51da76f70fb986364a, data reload: false

query5	4310	627	478	478
query6	455	198	188	188
query7	4923	558	299	299
query8	372	219	204	204
query9	8785	4057	3990	3990
query10	443	337	257	257
query11	5954	2329	2191	2191
query12	158	105	102	102
query13	1265	583	426	426
query14	6390	5412	5075	5075
query14_1	4390	4384	4431	4384
query15	208	207	182	182
query16	1003	485	417	417
query17	907	686	559	559
query18	2447	477	334	334
query19	199	188	141	141
query20	112	105	108	105
query21	225	137	119	119
query22	13552	13647	13350	13350
query23	17338	16597	16106	16106
query23_1	16288	16292	16242	16242
query24	7550	1774	1345	1345
query24_1	1284	1323	1295	1295
query25	598	445	390	390
query26	1298	320	159	159
query27	2680	567	351	351
query28	4503	2018	2007	2007
query29	1068	632	482	482
query30	307	233	203	203
query31	1130	1085	953	953
query32	102	65	62	62
query33	514	321	252	252
query34	1173	1154	654	654
query35	741	779	678	678
query36	1388	1419	1266	1266
query37	151	104	88	88
query38	3178	3173	3061	3061
query39	943	919	908	908
query39_1	898	881	873	873
query40	223	144	115	115
query41	65	63	63	63
query42	93	99	91	91
query43	319	322	274	274
query44	
query45	195	190	180	180
query46	1090	1240	742	742
query47	2364	2427	2271	2271
query48	394	405	294	294
query49	630	487	371	371
query50	1084	362	262	262
query51	4349	4310	4218	4218
query52	91	89	78	78
query53	246	269	199	199
query54	292	217	203	203
query55	79	74	71	71
query56	250	228	215	215
query57	1446	1432	1319	1319
query58	241	221	205	205
query59	1593	1656	1409	1409
query60	283	257	228	228
query61	162	171	161	161
query62	697	668	584	584
query63	237	189	178	178
query64	2561	803	677	677
query65	
query66	1843	468	339	339
query67	29948	29783	29623	29623
query68	
query69	430	308	269	269
query70	991	961	968	961
query71	297	227	216	216
query72	2917	2708	2387	2387
query73	868	738	444	444
query74	5161	4945	4748	4748
query75	2681	2601	2296	2296
query76	2340	1156	763	763
query77	360	382	300	300
query78	12801	12293	11950	11950
query79	1231	1065	784	784
query80	566	506	413	413
query81	458	291	255	255
query82	244	162	124	124
query83	284	297	255	255
query84	297	144	123	123
query85	904	611	508	508
query86	321	283	287	283
query87	3379	3368	3185	3185
query88	3622	2752	2731	2731
query89	412	384	336	336
query90	2119	187	179	179
query91	200	181	156	156
query92	69	63	62	62
query93	1475	1524	912	912
query94	559	402	320	320
query95	674	365	346	346
query96	1082	846	338	338
query97	2698	2704	2585	2585
query98	212	211	208	208
query99	1148	1167	1054	1054
Total cold run time: 251298 ms
Total hot run time: 170617 ms

yiguolei · 2026-06-08T01:54:00Z

/review

github-actions

I found one blocking issue in the ANN writer change. The PR improves small-segment skip behavior and fixes the IVF_ON_DISK min-training-row calculation, but replacing the old chunked build path with an unbounded full-segment PODArray buffer can make ANN index build allocate rows * dim * sizeof(float) outside Doris memory tracking until finish().

Critical checkpoint conclusions:

Goal and tests: the PR targets IVF/PQ recall, init-time reserve removal, IVF_ON_DISK min-train handling, and small segment skip behavior; tests cover these behaviors, but they do not cover high-dimensional/large-segment memory pressure introduced by the new buffering strategy.
Scope/focus: the implementation is mostly focused, but the removal of chunked add/train changes the memory bound of the writer substantially.
Concurrency: no new shared mutable state or lock ordering issue was found in the changed writer path; existing FAISS OpenMP budget remains used for train/add.
Lifecycle/static initialization: no new cross-TU static initialization or ownership-cycle issue was found.
Configuration: ann_index_build_min_segment_rows is mutable, validated non-negative, and read at finish(), so runtime changes can affect subsequent segment builds.
Compatibility/storage format: skipping a segment deletes the current index directory entry rather than writing a new format; reader load failure falls back to brute-force paths, so no storage-format incompatibility was found.
Parallel paths: normal ANN build and IVF_ON_DISK min-train paths were considered; no missing parallel code path was found beyond the memory-bound regression noted inline.
Error handling/data correctness: Status returns in the changed writer path are propagated; missing/small indexes appear to downgrade to data scan instead of returning incomplete results.
Performance/memory: blocking issue found: full-segment buffering is unbounded and not MemTracker-reserved.
Observability: existing logs are sufficient for skipped segments; no additional blocking observability issue found.

User focus: no additional user-provided review focus was specified.

github-actions · 2026-06-08T01:58:53Z

+    // The offsets check above guarantees every array row matches the ANN index dimension.
+    DCHECK(p != nullptr);
+    _buffered_vectors.insert(_buffered_vectors.end(), p, p + num_rows * dim);
+    _total_rows += cast_set<int64_t>(num_rows);


This changes the ANN writer from the old bounded chunk buffer (ann_index_build_add_chunk_size * dim) to retaining every vector in the segment until finish(). That makes the allocation segment_rows * dim * sizeof(float) in _buffered_vectors, and ANN dim is only validated as positive while segment splitting is not based on this buffer. For example, a high-dimensional ANN column can accumulate hundreds of MB or GB in this PODArray before FAISS train/add runs, and this allocation is not reserved against a Doris MemTracker. This reintroduces the OOM risk the PR is trying to avoid, just during load/append instead of init(). Please keep the training input bounded/tracked (or enforce a byte cap/reservation and fail cleanly) instead of unconditionally buffering the full segment.

yiguolei · 2026-06-08T02:15:43Z

skip check_coverage

ok

…ld-buffer reservation, and skip ANN index build for segments with insufficient rows. (apache#64082)

…ld-buffer reservation, and skip ANN index build for segments with insufficient rows. (#64216) ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: ### Release note Cherry-pick #64082 ### Check List (For Author) - Test  - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason  - Behavior changed: - [ ] No. - [ ] Yes.  - Does this need documentation? - [ ] No. - [ ] Yes.  ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label

kaka11chen and others added 2 commits June 1, 2026 16:42

[fix](be) Bound ANN build memory and train IVF indexes once

942cce4

[fix](ann-index) Fix ivf recall zero and oom.

3186914

Copilot AI review requested due to automatic review settings June 3, 2026 12:01

Copilot started reviewing on behalf of kaka11chen June 3, 2026 12:01 View session

kaka11chen changed the title ~~Ann build full buffer no spill~~ [fix](ann-index) Fix ivf recall zero and oom. Jun 3, 2026

Copilot AI reviewed Jun 3, 2026

[refactor](be) Rename ANN build add chunk configs

582071f

kaka11chen force-pushed the ann-build-full-buffer-no-spill branch from a66b5d7 to 582071f Compare June 3, 2026 13:21

update.

04d8a04

- Test: Regression test - ./run-regression-test.sh --run -d ann_index_p0 -s ivf_pq_full_buffer_train_recall - Behavior changed: No - Does this need documentation: No

kaka11chen changed the title ~~[fix](ann-index) Fix ivf recall zero and oom.~~ [fix](ann-index) Fix ANN IVF/PQ recall, avoid init-time large ANN build-buffer reservation, and skip ANN index build for segments with insufficient training rows. Jun 4, 2026

yiguolei reviewed Jun 5, 2026

View reviewed changes

kaka11chen added 3 commits June 5, 2026 12:03

yiguolei approved these changes Jun 5, 2026

View reviewed changes

yiguolei added dev/4.1.x usercase Important user case type label labels Jun 5, 2026

github-actions Bot added approved Indicates a PR has been approved by one committer. reviewed labels Jun 5, 2026

yiguolei approved these changes Jun 8, 2026

View reviewed changes

github-actions Bot previously requested changes Jun 8, 2026

View reviewed changes

yiguolei merged commit 20d68fe into apache:master Jun 8, 2026
33 of 34 checks passed

github-actions Bot added the dev/4.1.x-conflict label Jun 8, 2026

kaka11chen mentioned this pull request Jun 8, 2026

[fix](ann-index) Fix ANN IVF/PQ recall, avoid init-time large ANN build-buffer reservation, and skip ANN index build for segments with insufficient rows. #64216

Merged

16 tasks

kaka11chen added a commit to kaka11chen/doris that referenced this pull request Jun 8, 2026

[fix](ann-index) Fix ANN IVF/PQ recall, avoid init-time large ANN bui…

66aa78a

…ld-buffer reservation, and skip ANN index build for segments with insufficient rows. (apache#64082)

yiguolei added dev/4.1.2-merged and removed dev/4.1.x dev/4.1.x-conflict labels Jun 9, 2026


		_dir = compound_dir.value();

		_min_segment_rows = AnnIndexColumnWriter::min_segment_rows();

Conversation

kaka11chen commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Jun 3, 2026

Uh oh!

kaka11chen commented Jun 3, 2026

Uh oh!

kaka11chen commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

kaka11chen commented Jun 3, 2026

Uh oh!

hello-stephen commented Jun 3, 2026

Uh oh!

hello-stephen commented Jun 3, 2026

Uh oh!

hello-stephen commented Jun 3, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Jun 3, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Jun 3, 2026

BE Regression && UT Coverage Report

Uh oh!

kaka11chen commented Jun 4, 2026

Uh oh!

hello-stephen commented Jun 4, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Jun 4, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented Jun 4, 2026

Uh oh!

hello-stephen commented Jun 4, 2026

Uh oh!

kaka11chen commented Jun 4, 2026

Uh oh!

hello-stephen commented Jun 4, 2026

Uh oh!

hello-stephen commented Jun 4, 2026

Uh oh!

yiguolei Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

kaka11chen Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

yiguolei Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

kaka11chen commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

hello-stephen commented Jun 5, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Jun 5, 2026

Uh oh!

hello-stephen commented Jun 5, 2026

Uh oh!

yiguolei commented Jun 8, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

kaka11chen commented Jun 3, 2026 •

edited

Loading