About initial data on elasticsearch validate performance without vector columns

Hello,
    I did a benchmark with VectorDBbenchmark on elasticsearch. After data initialized, there isn't vector data inside index from elasticsearch. I also read the download data template, the vector column exists in it. is there any information I missed during the initialization.
 **Command**: **"vectordbbench elasticcloudhnsw --case-type Performance768D1M --k 10 --host <elastic_hostname> --port 9200 --user elastic --password <paasword> --m 16 --ef-construction 200 --search-concurrent --load-concurrency 8 --num-concurrency 1,10,50,100 --scheme http"**
**Output**: 
2026-06-15 10:07:22,850 | INFO: Task:
TaskConfig(db=<DB.ElasticCloud: 'ElasticCloud'>, db_config=ElasticCloudConfig(db_label='2026-06-15T10:07:22.760305', version='', note='', cloud_id=None, scheme='http', host='es-cn-nyw4tu5hi0001yfnk.elasticsearch.aliyuncs.com', port=9200, user='elastic', password=SecretStr('**********')), db_case_config=ElasticCloudIndexConfig(element_type=<ESElementType.float: 'float'>, index=<IndexType.ES_HNSW: 'hnsw'>, number_of_shards=1, number_of_replicas=0, refresh_interval='30s', merge_max_thread_count=8, use_rescore=False, oversample_ratio=2.0, use_routing=False, use_force_merge=True, metric_type=None, efConstruction=200, M=16, num_candidates=100), case_config=CaseConfig(case_id=<CaseType.Performance768D1M: 5>, custom_case={}, k=10, concurrency_search_config=ConcurrencySearchConfig(num_concurrency=[1, 10, 50, 100], concurrency_duration=30, concurrency_timeout=3600)), stages=['drop_old', 'load', 'search_serial', 'search_concurrent'], load_concurrency=4)
 (cli.py:659) (3145)
2026-06-15 10:07:22,851 | INFO: generated uuid for the tasks: 070569c9f508415f980745148b566b32 (interface.py:73) (3145)
2026-06-15 10:07:23,216 | INFO | DB             | CaseType     Dataset               Filter | task_label (task_runner.py:411)
2026-06-15 10:07:23,217 | INFO | -----------    | ------------ -------------------- ------- | -------    (task_runner.py:411)
2026-06-15 10:07:23,217 | INFO | ElasticCloud-2026-06-15T10:07:22.760305 | Performance  Cohere-MEDIUM-1M         0.0 | 070569c9f508415f980745148b566b32 (task_runner.py:411)
2026-06-15 10:07:23,217 | INFO: task submitted: id=070569c9f508415f980745148b566b32, 070569c9f508415f980745148b566b32, case number: 1 (interface.py:248) (3145)
2026-06-15 10:07:23,826 | INFO: [1/1] start case: {'label': <CaseLabel.Performance: 2>, 'name': 'Search Performance Test (1M Dataset, 768 Dim)', 'dataset': {'data': {'name': 'Cohere', 'size': 1000000, 'dim': 768, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'ElasticCloud-2026-06-15T10:07:22.760305'}, drop_old=True (interface.py:178) (3180)
2026-06-15 10:07:23,827 | INFO: Starting run (task_runner.py:149) (3180)
2026-06-15 10:07:23,958 | INFO: Elasticsearch client drop_old indices: **vdb_bench_indice** (elastic_cloud.py:56) (3180)
2026-06-15 10:07:25,823 | INFO: Read the entire file into memory: test.parquet (dataset.py:394) (3180)
2026-06-15 10:07:25,923 | INFO: Read the entire file into memory: neighbors.parquet (dataset.py:394) (3180)
2026-06-15 10:07:25,975 | INFO: Start performance case (task_runner.py:194) (3180)
2026-06-15 10:07:26,839 | INFO: (SpawnProcess-1:1) Start concurrent insert, batch_size=100, max_workers=4 (concurrent_runner.py:187) (3320)
2026-06-15 10:07:26,840 | INFO: Get iterator for shuffle_train.parquet (dataset.py:426) (3320)
2026-06-15 10:19:04,362 | INFO: (SpawnProcess-1:1) Finish concurrent insert, count=1000000, dur=697.52s (concurrent_runner.py:208) (3320)
2026-06-15 10:19:05,418 | INFO: Elasticsearch force merge task id: IzIQWvsDRDyJfYi3ILG6IA:25254 (elastic_cloud.py:216) (3374)
2026-06-15 10:36:05,759 | INFO: Finish loading the entire dataset into VectorDB, insert_duration=698.5211805580002, optimize_duration=1020.3182846210002 load_duration(insert + optimize) = 1718.8395 (task_runner.py:204) (3180)
2026-06-15 10:36:06,372 | INFO: Start search 30s in concurrency 1, filters: type=<FilterOp.NonFilter: 'NonFilter'> filter_rate=0.0 gt_file_name='neighbors.parquet' (mp_runner.py:129) (3180)

there is two columns "id" and "emb" inside shuffle_train.parquet, but didn't find the "emb" column inside the above index. 

$ **parquet-tools csv --head 1 shuffle_train.parquet**
id,emb
322406,"[ 1.96000963e-01 -5.27086198e-01 -2.95191228e-01  4.29556400e-01
  5.14418483e-01  3.23285192e-01  4.47883815e-01 -2.47427240e-01
  2.17925444e-01  2.95179904e-01 -1.87991694e-01 -1.45452484e-01
 -7.53609417e-03  2.48572137e-02 -2.38947198e-01 -5.72574914e-01
  2.85768330e-01 -2.50302762e-01 -1.09715998e-01  2.03979433e-01
 -2.87425637e-01  4.39991504e-01 -4.32560384e-01 -1.68661028e-02
 -1.18690394e-01 -1.56994104e-01 -3.84647399e-01 -2.81384345e-02
 -7.62783408e-01  3.80847305e-01  9.49241042e-01 -3.09303999e-01
  3.34682524e-01  4.52350616e-01 -6.91890001e-01  2.17385769e-01
 -1.60764053e-01  9.34349224e-02 -6.08903706e-01  3.95501107e-01
  4.59643811e-01  1.34819821e-02  5.26180983e-01 -2.78248936e-01
  4.71442789e-01 -4.53977764e-01  4.71780390e-01 -1.68278441e-01
  6.41057193e-02  2.62458622e-01 -1.20296814e-01 -4.32358563e-01
 -5.24910808e-01  1.35188848e-01 -3.00156236e-01 -9.81063619e-02
...

<img width="1920" height="1087" alt="Image" src="https://github.com/user-attachments/assets/153a3507-b52c-4646-9342-f67d68dd8d45" />

if there isn't vector column inside the index, How dose the vector performance can be validated? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About initial data on elasticsearch validate performance without vector columns #798

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

About initial data on elasticsearch validate performance without vector columns #798

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions