RMIT University
Browse

Cache-efficient string sorting using copying

journal contribution
posted on 2024-11-01, 03:15 authored by Ranjan Sinha, Justin Zobel
Burstsort is a cache-oriented sorting technique that uses a dynamic trie to efficiently divide large sets of string keys into related subsets small enough to sort in cache. In our original burstsort, string keys sharing a common prefix were managed via a bucket of pointers represented as a list or array; this approach was found to be up to twice as fast as the previous best string sorts, mostly because of a sharp reduction in out-of-cache references. In this paper, we introduce C-burstsort, which copies the unexamined tail of each key to the bucket and discards the original key to improve data locality. On both Intel and PowerPC architectures, and on a wide range of string types, we show that sorting is typically twice as fast as our original burstsort and four to five times faster than multikey quicksort and previous radixsorts. A variant that copies both suffixes and record pointers to buckets, CP-burstsort, uses more memory, but provides stable sorting. In current computers, where performance is limited by memory access latencies, these new algorithms can dramatically reduce the time needed for internal sorting of large numbers of strings.

History

Related Materials

  1. 1.
    ISSN - Is published in 10846654

Journal

Journal of Experimental Algorithmics

Volume

11

Issue

1.2

Start page

1

End page

32

Total pages

32

Publisher

Association for Computing Machinery

Place published

US

Language

English

Copyright

© 2006 ACM

Former Identifier

2006001880

Esploro creation date

2020-06-22

Fedora creation date

2009-02-27

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC