Skip to content

Conversation

@kaka11chen
Copy link
Contributor

What problem does this PR solve?

Problem Summary:

Release note

[Fix] (file-scanner) Fix query result is incorrect when non-deterministic func push down.

  • After rand() is pushed down to the ORC FileScan, it becomes a ScanNode conjunct. Since rand() < 1 has no column references, VFileScanner::_process_conjuncts_for_dict_filter hits slot_ids.size() == 0 and returns early.
  • This early return prevents later predicates from being processed; immediately after, _process_late_arrival_conjuncts calls _discard_conjuncts() and clears the original predicates. As a result, predicates like xxx IS NOT NULL are dropped.
  • Once dropped, the Parquet/ORC reader only applies the “earlier processed” predicates for filtering, so rows with xxx = NULL appear.

Therefore, the rand() pushdown triggers the early return in predicate handling, which discards other filters. That’s why NULLs only show up when rand() is pushed down.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 30960 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3c92038a785502dcc43cdbf50496236941812fa6, data reload: false

------ Round 1 ----------------------------------
q1	17621	4721	4555	4555
q2	1986	326	197	197
q3	10283	1336	708	708
q4	10208	851	303	303
q5	7560	2079	1824	1824
q6	190	172	138	138
q7	916	711	590	590
q8	9274	1395	1103	1103
q9	4885	4563	4674	4563
q10	6798	1648	1276	1276
q11	525	295	265	265
q12	341	367	219	219
q13	17799	3779	3086	3086
q14	230	245	217	217
q15	592	523	518	518
q16	631	650	591	591
q17	658	805	489	489
q18	6442	6314	6408	6314
q19	1295	1008	640	640
q20	432	398	239	239
q21	2941	2371	2085	2085
q22	1146	1078	1040	1040
Total cold run time: 102753 ms
Total hot run time: 30960 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4923	4871	4996	4871
q2	336	385	332	332
q3	2398	2940	2537	2537
q4	1503	1848	1411	1411
q5	4472	4365	4366	4365
q6	217	169	126	126
q7	1972	1935	1845	1845
q8	2813	2378	2406	2378
q9	7155	7053	7288	7053
q10	2447	2751	2403	2403
q11	565	486	462	462
q12	724	800	633	633
q13	3377	3947	3103	3103
q14	290	287	274	274
q15	534	499	502	499
q16	601	681	596	596
q17	1125	1245	1293	1245
q18	7466	7366	7091	7091
q19	818	803	790	790
q20	1912	1958	1808	1808
q21	4537	4240	4060	4060
q22	1058	1000	960	960
Total cold run time: 51243 ms
Total hot run time: 48842 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172854 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3c92038a785502dcc43cdbf50496236941812fa6, data reload: false

query5	4690	629	503	503
query6	322	212	198	198
query7	4241	460	280	280
query8	330	246	238	238
query9	8729	2868	2848	2848
query10	471	333	285	285
query11	15127	15084	14824	14824
query12	182	123	124	123
query13	1254	472	395	395
query14	6402	2993	2802	2802
query14_1	2716	2654	2700	2654
query15	209	189	171	171
query16	979	482	467	467
query17	1090	667	588	588
query18	2553	436	333	333
query19	200	184	155	155
query20	120	120	114	114
query21	212	140	125	125
query22	4180	4101	4201	4101
query23	16082	15763	15417	15417
query23_1	15603	15535	15456	15456
query24	7104	1552	1162	1162
query24_1	1167	1171	1170	1170
query25	555	461	414	414
query26	1237	263	153	153
query27	2776	439	273	273
query28	4534	2174	2143	2143
query29	777	545	449	449
query30	315	267	208	208
query31	810	647	551	551
query32	93	81	87	81
query33	536	367	311	311
query34	894	896	539	539
query35	728	766	673	673
query36	877	898	822	822
query37	144	100	93	93
query38	2725	2657	2622	2622
query39	777	757	739	739
query39_1	723	727	732	727
query40	222	137	118	118
query41	72	71	70	70
query42	97	91	95	91
query43	448	471	422	422
query44	1326	738	744	738
query45	193	198	180	180
query46	832	953	581	581
query47	1384	1556	1422	1422
query48	315	321	238	238
query49	606	437	347	347
query50	677	263	223	223
query51	3782	3791	3779	3779
query52	89	93	84	84
query53	200	226	181	181
query54	280	248	252	248
query55	77	79	77	77
query56	301	288	301	288
query57	1011	1029	919	919
query58	269	261	256	256
query59	2054	2125	1956	1956
query60	332	330	319	319
query61	149	145	164	145
query62	394	359	311	311
query63	194	155	162	155
query64	4836	1179	866	866
query65	3790	3741	3791	3741
query66	1433	418	303	303
query67	15572	15566	15525	15525
query68	2388	1060	722	722
query69	400	312	287	287
query70	999	959	929	929
query71	309	298	266	266
query72	5297	3124	3233	3124
query73	607	715	312	312
query74	8785	8752	8578	8578
query75	2302	2540	1888	1888
query76	2276	1051	652	652
query77	361	383	306	306
query78	9677	9776	9082	9082
query79	2834	885	585	585
query80	1724	524	452	452
query81	567	259	236	236
query82	1027	150	116	116
query83	325	260	255	255
query84	259	127	99	99
query85	891	472	419	419
query86	412	294	284	284
query87	2854	2851	2759	2759
query88	3477	2579	2543	2543
query89	305	255	240	240
query90	1964	179	161	161
query91	171	157	133	133
query92	75	73	66	66
query93	1326	1017	645	645
query94	637	333	294	294
query95	578	327	311	311
query96	639	494	222	222
query97	2326	2345	2330	2330
query98	227	203	195	195
query99	600	605	508	508
Total cold run time: 249332 ms
Total hot run time: 172854 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 26.53 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3c92038a785502dcc43cdbf50496236941812fa6, data reload: false

query1	0.06	0.05	0.04
query2	0.09	0.05	0.05
query3	0.26	0.09	0.09
query4	1.60	0.12	0.11
query5	0.27	0.25	0.26
query6	1.14	0.67	0.65
query7	0.04	0.03	0.03
query8	0.05	0.05	0.04
query9	0.57	0.50	0.48
query10	0.55	0.55	0.54
query11	0.14	0.10	0.09
query12	0.15	0.10	0.10
query13	0.60	0.58	0.58
query14	0.95	0.94	0.94
query15	0.78	0.77	0.79
query16	0.41	0.40	0.40
query17	1.04	1.07	0.98
query18	0.22	0.21	0.20
query19	1.84	1.82	1.89
query20	0.02	0.01	0.02
query21	15.45	0.26	0.13
query22	5.25	0.05	0.05
query23	15.94	0.29	0.10
query24	0.94	0.54	0.17
query25	0.08	0.06	0.08
query26	0.15	0.14	0.13
query27	0.06	0.08	0.05
query28	3.86	1.07	0.88
query29	12.53	3.90	3.16
query30	0.28	0.14	0.12
query31	2.81	0.60	0.39
query32	3.24	0.55	0.45
query33	3.02	2.98	3.19
query34	16.16	5.08	4.39
query35	4.45	4.44	4.49
query36	0.66	0.51	0.48
query37	0.11	0.07	0.07
query38	0.07	0.04	0.04
query39	0.04	0.02	0.02
query40	0.18	0.14	0.14
query41	0.09	0.04	0.04
query42	0.04	0.03	0.03
query43	0.05	0.04	0.04
Total cold run time: 96.24 s
Total hot run time: 26.53 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.46% (19117/36441)
Line Coverage 35.83% (177614/495671)
Region Coverage 32.30% (137266/424997)
Branch Coverage 33.24% (59443/178813)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 50.00% (1/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.11% (26114/35720)
Line Coverage 56.15% (278043/495139)
Region Coverage 53.84% (231237/429510)
Branch Coverage 55.58% (99807/179584)

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 24, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants