Regular Expression Tips

Sometimes these are useful when the things you want to match are tricky.

  • ?= Positive Lookahead
  • ?<= Positive Lookbehind
  • ?! Negative Lookahead
  • ?<! Negative Lookbehind

(?=foo) asserts what immediately follows the current position in the string is foo.
(?<=foo) asserts what immediately precedes the current position in the string is foo.
(?!foo) asserts what immediately follows the current position in the string is not foo.
(?<!foo) asserts what immediately precedes the current position in the string is not foo.

These are expensive to compute and likely the expense grows exponentially with larger input size. Using this technique also needs to be very careful. It could take down the entire internet. Read this excellent postmodern from Cloudfare on their outage on July 2, 2019. Remember the plural of regexp is regrets.

\p{Word} can match words defined in Unicode
\p{Space} can match all kinds of "space"
\p{Blank} can match space or tab
\p{Han} can match Chinese characters
\p{Hiragana} can match Hiragana characters

Remember no matching is faster than any matching! You should try not to use these by better transforming your input, eliminate the need to do matching or matching less subparts of strings.

  • String#scrub can remove invalid unicode chars (2.1.0+)
  • String#unicode_normalize can normalize strings (2.2.0+)
  • String##upcase/downcase/swapcase/capitalize (2.4.0+)

handle_regexp = /@\K\w+/
callsign_regexp = /@\w+/

Given @juanitofatas, handle_regexp will matched juanitofatas; callsign_regexp will matched @juanitofatas. \K is to tell the regexp engine to match from here.

Example to match string in Ruby.

/(?<quote>['"]).*\k<quote>/

Hope these are helpful, cheers!