Talk:Regular Expressions

About

 * Started: 2 June 2006
 * Size: 2,400 words (Oct 2008)

Depth of Book?
I'm wondering if this book is going to be simply a reference to programming using regular expressions, or if this book will cover the mathematical theory behind regular expressions. Explaining the theory behind it would require a divergance into autmata theory, state machines, etc. --Whiteknight (talk) (projects) 01:02, 5 June 2006 (UTC)


 * I think that is perfectly valid content for this book. It should probably be in a chapter devoted to advanced topics. &mdash; franl &#x2726; talk &#x2726;  17:11, 5 06 2006 (UTC)

How can we differentiate this content from the Wikipedia article?
The corresponding Wikipedia article – Regular Expressions – is very detailed. How can we differentiate this book's content from that article? Perhaps this book should have a tutorial style? &mdash; franl &#x2726; talk &#x2726; 17:22, 5 06 2006 (UTC)


 * I don't know if you have any available, but there are a number of existing book references on the subject of regular expressions. You could use these existing books to prepare a book-length outline or plan for implementing this wikibook. Also, there are a number of different computer languages that utilize regular expressions (Perl, sed/awk, emacs, PHP, etc), and you could easily devote a page to regular expression implementation in each different language. Also, you could include projects using regular expressions, such as a C preprocessor module, HTML tag extractor, loader (with virtual address correction), automatic spell checker, etc. --Whiteknight (talk) (projects) 17:51, 5 June 2006 (UTC)


 * Note, I am glad to help here, will be my first contributions to this project!Eagle 101 03:24, 7 June 2006 (UTC)
 * Though I have worked on wikipedia for 5 months, I know nothing of what to do here...Eagle 101 03:27, 7 June 2006 (UTC)

Automata theory
As per my note above, I have decided to create a new wikibook on the subject of automata theory. Currently, this new book is in the planning stage, and exists only as a preliminary outline HERE on my user page. Since this is only an outline, I would appreciate any comments or suggestions on this matter. --Whiteknight (talk) (projects) 00:19, 12 June 2006 (UTC)

Beginner Mistakes?
I'd like to suggest that there be a section on trouble-shooting regex patterns. I've learned about regex in the past week and have written up two problems I had on my blog here: http://www.mind-manual.com/blog/index.php/2008/08/07/regular-expressions-how-i-hatelove-thee/. I'd do it myself but I'm not sure whether it'll fit into the context of this book. Cheers!

Renaming chapters and more
I have made edits across the book, summarized here.

In May 2011‎, chapter Regular Expressions/Syntaxes was split into subchapters: Today, I have renamed the subchapters to make them shallowly named and capitalized: I have made further edits: --Dan Polansky (discuss • contribs) 09:32, 15 June 2013 (UTC)
 * Regular Expressions/syntax/shell regular expression
 * Regular Expressions/syntax/simple regular expression
 * Regular Expressions/syntax/basic regular expression
 * Regular Expressions/syntax/emacs regular expression
 * Regular Expressions/syntax/non posix basic regular expression
 * Regular Expressions/syntax/perl compatible regular expression
 * Regular Expressions/syntax/posix basic regular expression
 * Regular Expressions/syntax/posix extended regular expression
 * Regular Expressions/Shell Regular Expressions
 * Regular Expressions/Simple Regular Expressions
 * Regular Expressions/Basic Regular Expressions
 * Regular Expressions/Emacs Regular Expressions
 * Regular Expressions/Non-Posix Basic Regular Expressions
 * Regular Expressions/Perl Compatible Regular Expressions
 * Regular Expressions/Posix Basic Regular Expressions
 * Regular Expressions/Posix Extended Regular Expressions
 * Rename heading "Tools using this regular expression syntax" to "Use in Tools".
 * Remove the first heading from each chapter that only repeats the chapter name.
 * Remove unneeded redlinks to some keywords, as in this edit.
 * Add section "Links" to some chapters to stand for external links.

Redundant pages
I request that the following pages added in May 2011‎, redundant to what the book already has, get deleted: I especially object to having a subpage per operator. --Dan Polansky (discuss • contribs) 20:18, 23 June 2013 (UTC)
 * Regular Expressions/operator
 * Regular Expressions/operator/asterisk
 * Regular Expressions/operator/backslash
 * Regular Expressions/operator/box
 * Regular Expressions/operator/careted box
 * Regular Expressions/operator/character range
 * Regular Expressions/operator/dot
 * Regular Expressions/operator/parentheses

Big caveat on subexpression resolution
As Ross Cox explains, when you run a regex with capture groups, it's not uncommon to end up with multiple valid interpretations of what to capture. Apparently there are quite a few ways to do this stuff; they agree on the fact that .* should be "greedy" (i.e. pick the longest result) prioritizing the leftmost element, but then diverge from that point on. Perl has a behavior that's difficult to put in words but easier to write in code, but POSIX came in and specified a behavior that's easy to put in words but surprisingly hard to write in code. Now despite everyone trying to implement the POSIX regex.h in their C library, they don't usually get this subexpression behavior right. We should describe this behavior and caution people to manage their expectations.

Cox also mentions that someone figured out how to do it efficiently... decades after the specification was written. At least it's possible now.

See: Artoria2e5 (discuss • contribs) 10:29, 9 February 2024 (UTC)
 * Ross Cox: https://swtch.com/~rsc/regexp/regexp2.html sections "Ambiguous Submatching", "POSIX Submatching"
 * Haskell Wiki: https://wiki.haskell.org/Regex_Posix, which describes how C library fails. It states that "The GLIBC engine has an API for its own GNU regular expression standard which differs from Posix.", which is quite intriguing -- someone should hunt down that standard.
 * POSIX spec: https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html