-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Feature](paimon) Integrate paimon-cpp library for native Paimon table support #60194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces an optional paimon-cpp–based reader path for Paimon table scans, including third-party integration, FE session plumbing, and BE scan/reader implementation.
Changes:
- Add paimon-cpp as a third-party dependency (download, patching, build/install integration).
- Add FE session variable + thrift propagation to enable paimon-cpp reader, and adjust Paimon split serialization/table-location passing.
- Add BE paimon-cpp reader, predicate pushdown conversion, and a Doris-backed filesystem adapter for paimon-cpp.
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| thirdparty/vars.sh | Adds paimon-cpp archive metadata and includes it in the third-party download list. |
| thirdparty/patches/paimon-cpp-b1ffd6f73e5edf57aac24ec2eaf6d2ef9e9a9850.patch | Applies Doris integration changes to paimon-cpp (CMake options, ORC compat, external deps, etc.). |
| thirdparty/download-thirdparty.sh | Applies the paimon-cpp patch during third-party source preparation. |
| thirdparty/build-thirdparty.sh | Adds paimon-cpp build/install steps and adjusts protobuf build flags. |
| gensrc/thrift/PaloInternalService.thrift | Adds enable_paimon_cpp_reader to TQueryOptions. |
| fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java | Adds session variable enable_paimon_cpp_reader and extends ignore-split enum/options. |
| fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/source/PaimonSource.java | Adds table-location extraction helper for passing to BE. |
| fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/source/PaimonScanNode.java | Switches split serialization for paimon-cpp and passes table location via thrift. |
| fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/PaimonUtil.java | Adds native Paimon DataSplit serialization (standard Base64) for BE paimon-cpp. |
| fe/be-java-extensions/paimon-scanner/src/main/java/org/apache/doris/paimon/PaimonUtils.java | Adds Base64 decode fallback for non-URL-safe encoding. |
| build.sh | Threads paimon-cpp enablement and PAIMON_HOME through to BE CMake configure. |
| bin/start_be.sh | Prepends ${DORIS_HOME}/lib to LD_LIBRARY_PATH. |
| be/src/vec/exec/scan/file_scanner.cpp | Routes paimon scans to paimon-cpp reader when enabled; adds predicate pushdown wiring. |
| be/src/vec/exec/format/table/paimon_predicate_converter.h | Declares conversion from Doris VExpr predicates to paimon-cpp predicates. |
| be/src/vec/exec/format/table/paimon_predicate_converter.cpp | Implements predicate conversion (binary ops, IN, is null, prefix-like). |
| be/src/vec/exec/format/table/paimon_doris_file_system.h | Declares force-link registration helper for paimon-cpp FS factory. |
| be/src/vec/exec/format/table/paimon_doris_file_system.cpp | Implements paimon-cpp filesystem backed by Doris FileFactory/IO stack. |
| be/src/vec/exec/format/table/paimon_cpp_reader.h | Declares the paimon-cpp reader implementation for BE scans. |
| be/src/vec/exec/format/table/paimon_cpp_reader.cpp | Implements split decode, read context setup, batch import via Arrow C bridge, and block materialization. |
| be/src/tools/CMakeLists.txt | Adds an optional paimon_cpp_demo tool when ENABLE_PAIMON_CPP is ON. |
| be/CMakeLists.txt | Adds ENABLE_PAIMON_CPP/PAIMON_HOME options and links paimon-cpp static libs and dependencies into BE. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Set table location for paimon-cpp reader | ||
| String tableLocation = source.getTableLocation(); | ||
| if (tableLocation != null) { | ||
| fileDesc.setPaimonTable(tableLocation); | ||
| } |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TPaimonFileDesc.paimon_table is marked deprecated in gensrc/thrift/PlanNodes.thrift. Reusing it for paimon-cpp table location makes the API harder to evolve and risks removal/behavior changes. Prefer adding a new non-deprecated thrift field (or updating the thrift definition/comment if this field is intended to be used again) for the paimon-cpp table root/location.
| // Set table location for paimon-cpp reader | |
| String tableLocation = source.getTableLocation(); | |
| if (tableLocation != null) { | |
| fileDesc.setPaimonTable(tableLocation); | |
| } |
| return to_paimon_status(exists_status); | ||
| } | ||
| if (exists) { | ||
| return Status::Exist("file already exists: ", normalized_path); |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call returns Status::Exist with two arguments. Elsewhere in this file paimon::Status helpers are called with a single message string, so this likely won’t compile against paimon::Status API. Consider building a single message string (e.g., concatenation/format) and passing that to Status::Exist, or use the correct overload if one exists.
| return Status::Exist("file already exists: ", normalized_path); | |
| return Status::Exist(std::string("file already exists: ") + normalized_path); |
| @@ -0,0 +1,82 @@ | ||
| +#!/bin/bash | ||
| +# Paimon-cpp 依赖库检查脚本 | ||
| + | ||
| +# 默认 PAIMON_HOME | ||
| +PAIMON_HOME="${PAIMON_HOME:-/mnt/disk1/chenjunwei/paimon-install-asan}" |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dependency check script added to the paimon-cpp patch hardcodes a developer-specific default PAIMON_HOME ("/mnt/disk1/chenjunwei/paimon-install-asan"). This will be wrong in almost all environments and can mislead users. Prefer no default (require PAIMON_HOME to be set) or derive it relative to the script location, and keep the output/messages environment-agnostic.
| @@ -0,0 +1,82 @@ | |
| +#!/bin/bash | |
| +# Paimon-cpp 依赖库检查脚本 | |
| + | |
| +# 默认 PAIMON_HOME | |
| +PAIMON_HOME="${PAIMON_HOME:-/mnt/disk1/chenjunwei/paimon-install-asan}" | |
| @@ -0,0 +1,86 @@ | |
| +#!/bin/bash | |
| +# Paimon-cpp 依赖库检查脚本 | |
| + | |
| +# 默认 PAIMON_HOME:优先使用环境变量,否则相对于当前脚本目录推导 | |
| +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" | |
| +if [ -z "${PAIMON_HOME:-}" ]; then | |
| + PAIMON_HOME="$(cd "$SCRIPT_DIR/.." && pwd)" | |
| +fi |
| # Paimon static library dependencies | ||
| set(paimon_fmt_lib "${PAIMON_HOME}/lib64/paimon_deps/libfmtd.a") | ||
| if (NOT EXISTS "${paimon_fmt_lib}") | ||
| set(paimon_fmt_lib "${PAIMON_HOME}/lib64/paimon_deps/libfmt.a") | ||
| endif() | ||
| add_library(paimon_fmt STATIC IMPORTED) | ||
| set_target_properties(paimon_fmt PROPERTIES IMPORTED_LOCATION "${paimon_fmt_lib}") | ||
| set(paimon_tbb_lib "${PAIMON_HOME}/lib64/paimon_deps/libtbb_debug.a") | ||
| if (NOT EXISTS "${paimon_tbb_lib}") | ||
| set(paimon_tbb_lib "${PAIMON_HOME}/lib64/paimon_deps/libtbb.a") | ||
| endif() | ||
| add_library(paimon_tbb STATIC IMPORTED) | ||
| set_target_properties(paimon_tbb PROPERTIES IMPORTED_LOCATION "${paimon_tbb_lib}") | ||
| add_library(paimon_roaring STATIC IMPORTED) | ||
| set_target_properties(paimon_roaring PROPERTIES IMPORTED_LOCATION "${PAIMON_HOME}/lib64/paimon_deps/libroaring_bitmap.a") | ||
| add_library(paimon_xxhash STATIC IMPORTED) | ||
| set_target_properties(paimon_xxhash PROPERTIES IMPORTED_LOCATION "${PAIMON_HOME}/lib64/paimon_deps/libxxhash.a") | ||
| add_library(paimon_arrow_dataset STATIC IMPORTED) | ||
| set_target_properties(paimon_arrow_dataset PROPERTIES IMPORTED_LOCATION "${PAIMON_HOME}/lib64/paimon_deps/libarrow_dataset.a") | ||
| add_library(paimon_arrow_acero STATIC IMPORTED) | ||
| set_target_properties(paimon_arrow_acero PROPERTIES IMPORTED_LOCATION "${PAIMON_HOME}/lib64/paimon_deps/libarrow_acero.a") | ||
| add_library(paimon_arrow STATIC IMPORTED) | ||
| set_target_properties(paimon_arrow PROPERTIES IMPORTED_LOCATION "${PAIMON_HOME}/lib64/paimon_deps/libarrow.a") | ||
| set(paimon_arrow_bundled_deps_lib "${PAIMON_HOME}/lib64/paimon_deps/libarrow_bundled_dependencies.a") | ||
| if (EXISTS "${paimon_arrow_bundled_deps_lib}") | ||
| add_library(paimon_arrow_bundled_dependencies STATIC IMPORTED) | ||
| set_target_properties(paimon_arrow_bundled_dependencies PROPERTIES IMPORTED_LOCATION "${paimon_arrow_bundled_deps_lib}") | ||
| endif() | ||
| add_library(paimon_parquet STATIC IMPORTED) | ||
| set_target_properties(paimon_parquet PROPERTIES IMPORTED_LOCATION "${PAIMON_HOME}/lib64/paimon_deps/libparquet.a") | ||
| if(NOT PAIMON_USE_EXTERNAL_ORC) | ||
| add_library(paimon_orc STATIC IMPORTED) | ||
| set_target_properties(paimon_orc PROPERTIES IMPORTED_LOCATION "${PAIMON_HOME}/lib64/paimon_deps/liborc.a") | ||
| endif() |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The imported paimon dependency libraries (fmt/tbb fallbacks, roaring/xxhash/arrow/parquet/orc) are not validated after selecting paths. If any file is missing, CMake will fail later with a less actionable error. Consider adding explicit EXISTS checks and a clear FATAL_ERROR listing the missing artifacts under PAIMON_HOME, especially after the fallback selection for fmt/tbb.
| #include "vec/exec/format/table/max_compute_jni_reader.h" | ||
| #include "vec/exec/format/table/paimon_cpp_reader.h" | ||
| #include "vec/exec/format/table/paimon_predicate_converter.h" |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These paimon-cpp headers are included unconditionally, but the project only adds PAIMON_HOME include paths when ENABLE_PAIMON_CPP is ON. With ENABLE_PAIMON_CPP=OFF this will fail to compile due to missing "paimon/..." headers (and Vec/CMakeLists.txt globs all *.cpp, so paimon_cpp_reader.cpp will be built too). Please gate the includes/usage (and the new paimon-cpp .cpp files) behind the CMake option (e.g., exclude sources when disabled and/or add a compile definition when enabled).
| #include "vec/exec/format/table/max_compute_jni_reader.h" | |
| #include "vec/exec/format/table/paimon_cpp_reader.h" | |
| #include "vec/exec/format/table/paimon_predicate_converter.h" | |
| #include "vec/exec/format/table/max_compute_jni_reader.h" | |
| #ifdef ENABLE_PAIMON_CPP | |
| #include "vec/exec/format/table/paimon_cpp_reader.h" | |
| #include "vec/exec/format/table/paimon_predicate_converter.h" | |
| #endif |
| public String getTableLocation() { | ||
| if (originTable instanceof FileStoreTable) { | ||
| return ((FileStoreTable) originTable).location().toString(); | ||
| } | ||
| // Fallback to path option | ||
| return originTable.options().get("path"); | ||
| } |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getTableLocation() dereferences originTable without a null check, but the @VisibleForTesting no-arg constructor explicitly sets originTable = null. This will now throw NPE in unit tests (and any test helpers using the no-arg constructor). Consider returning null when originTable is null (or throwing a clearer exception in non-test paths).
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
When building paimon-cpp in Doris thirdparty, compilation fails due to ORC API incompatibility.
Paimon-cpp defaults to using ORC 2.x APIs, but Doris thirdparty has pre-built ORC 1.9.0. The API differences between ORC 1.x and 2.x cause compilation errors like:
readAsyncoverride error (function not virtual in ORC 1.x)getHighBits()/getLowBits()const mismatchsetDatamethod not foundThis PR enables
PAIMON_ORC_V1=ONflag to use the ORC 1.x compatibility wrapper that already exists in paimon-cpp codebase.Release note
None
Check List (For Author)
Test
Manual test steps: