ATLAS-5032: Updated CONTAINS so we only defer to JanusGraph when inde…#672
Open
saksenasonali wants to merge 1 commit into
Open
ATLAS-5032: Updated CONTAINS so we only defer to JanusGraph when inde…#672saksenasonali wants to merge 1 commit into
saksenasonali wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
indexType STRING fields are indexed without tokenization (Mapping.STRING), so that issue applies to TEXT attributes (indexType == null), not STRING.
qualifiedName uses default TEXT indexing and attributes with indexType: STRING (e.g. name, owner) don't have this tokenization problem.
Update the CONTAINS check so we only fall back to Janus graph when indexType == null and either the filter value exceeds max token length or contains tokenize characters. STRING indexType attributes will keep using index search for long CONTAINS values.
ATLAS-5032: Fix basic search for long qualifiedName with startsWith / endsWith / contains
Problem
Basic search with attribute filters on qualifiedName returns no results when filter values exceed Solr’s default max token length (255). This affects startsWith, endsWith, and contains, especially when multiple criteria on the same attribute are combined with AND (e.g. qualifiedName starts with a long prefix and ends with @primary).
Root cause: Solr ignores tokens longer than maxTokenLength, so index-based search does not match even though the entity exists and can be retrieved by GUID.
Solution (Approach 2 from ATLAS-5032)
For indexed string attributes, when the filter value length exceeds the configured Solr token limit, do not use the Solr index for STARTS_WITH, ENDS_WITH, or CONTAINS. Search falls back to JanusGraph instead.
Also ensure index and graph query paths stay consistent when the same attribute appears in multiple AND criteria:
Skip graph filter construction when the criterion is still index-searchable.
Skip index query construction when the criterion is not index-searchable.
How was this patch tested?
Unit / module tests
EntitySearchProcessorTest — 48 tests, including 6 new ATLAS-5032 scenarios (short and long qualifiedName, hive_table and hive_column, tokenized name, CONTAINS + ENDS_WITH).
Full repository module: mvn -pl repository test — 2391 tests, 0 failures.
mvn -pl common,repository -DskipTests install — build success.
Manual / REST (local Docker Atlas)
Reproduced the JIRA flow against http://localhost:21000/:
Created a hive_table with a ~370-character name and qualifiedName default.@primary.
Basic search with AND:
qualifiedName startsWith default.<370-char-name>
qualifiedName endsWith @primary
Result: 1 matching entity (previously empty).