Fastest Way to Find Substring Occurrences in PHP

I was recently writing a bit of code where I needed to find the number of occurrences of many different substrings in a long string. Think finding a three to five word phrase in an entire book. After some profiling and fixing a few other issues, it became apparent that calling substr_count a hundred thousand times was eating the lion's share of the execution time.


So I started looking into it a bit. I didn't quite know where to look, what with a native PHP function generally being the fastest way to do things. After trying a couple things that didn't pan out, I thought "well, I've tried everything else, might just as well check to see how a regex compares". And to my significant surprise, preg_match_all is far faster when finding substring occurrences in longer strings.


As an example, a test script I wrote shows that using preg_match_all instead of substr_count takes about one fourth the time. If you'd like to see the test script, it's part of my PHPPerformanceExamples repo on GitHub.


Thanks for stopping by; I'll see you next time.