Skip to content

Speed up bytes.hex() and related pystrhex.c users using SIMD #144015

@gpshead

Description

@gpshead

Feature or enhancement

Proposal:

We consolidated much of our bytes -> hexadecimal string code into one place as Python/pystrhex.c a while back. It is still written using a traditional scalar iterate over bytes and convert their nibbles logic as i snatural. Now that it's all in one place, we can do better.

x86_64 and arm64 (aarch64) are both guaranteed to have SSE2 and NEON respectively which can handle processing 16 bytes at once. Modern compilers, starting with clang eons ago, and more recently with gcc 12 (2022) abstract the operation we need to do into a nice function so we do not even need to directly write the CPU specific code for this use case. Maintainability win!

Will it be worthwhile? It turns out the answer is yes (see PR). It is minor on something as lowly as a baseline minimum md5.hexdigest() (16-bytes) but is clear on larger data such as sha256.hexdigest() and sha512.hexdigest(). At those sizes it is common to see 1.5-3x faster hex conversion. Realistically I doubt many applications are doing conversions from binary data into hex larger than those quite common practical use cases, but it can be >10x faster if so (as measured at 4K).

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Linked PRs

Metadata

Metadata

Assignees

Labels

interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagetype-featureA feature request or enhancement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions