posted on 2024-11-01, 02:28authored byKangmin Fan, Simon Puglisi, William Smyth, Andrew Turpin
Given a string x = x[1..n], a repetition of period p in x is a substring u(r) = x[i..i+rp-1], p = vertical bar u vertical bar, r >= 2, where neither u = x[i..i+p-1] nor x [i..i+(r+1)p-1] is a repetition. The maximum number of repetitions in any string x is well known to be Theta(n log n). A run or maximal periodicity of period p in x is a substring u(r)t = x[i..i+rp+vertical bar t vertical bar-1] of x, where ur is a repetition, t is a proper prefix of u, and no repetition of period p begins at position i-1 of x or ends at position i+rp+vertical bar t vertical bar. In 2000 Kolpakov and Kucherov [J. Discrete Algorithms, 1 ( 2000), pp. 159-186] showed that the maximum number rho(n) of runs in any string x is O(n), but their proof was nonconstructive and provided no specific constant of proportionality. At the same time, they presented experimental data strongly suggesting that rho(n) < n. Related work by Fraenkel and Simpson [J. Combin. Theory Ser. A., 82 (1998), pp. 112-120] showed that the maximum number sigma(n) of distinct squares in any string x satisfies sigma(n) < 2n, while experiment again encourages the belief that in fact sigma(n) < n. In this paper, as a first step toward proving these conjectures, we present a periodicity lemma that establishes limitations on the number and range of periodicities that can occur over a specified range of positions in x. We then apply this result to specify corresponding limitations on the occurrence of runs.