Regular expression for the strings without a particular substring

Question Detail: 

How can we design a regular expressions without particular substrings. The goal of this is to create language L which won't contain a particular substring (i.e. 110)

for the case of a regular expression without substring $110$, I Was thinking of: $\require{cancel}\cancel{(101)^*+}(010)^*+(10)^*+(\cancel{1}1)^*+(\cancel{0}0)^*+(01)^*$ but is that over excessive?

Then for example, I crossed out (101)* because obviously if you have two of those 101101, a subset of that will be 110, which we don't want.

Notes:

Question has been edited since it gained attention in the past few days. Also see comment for justification.

Asked By : Iancovici
Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/11787

Answered By : Wandering Logic

I did the following multistep technique.

Step 1: make a DFA that accepts all strings with "110". I am able to construct a DFA that accepts all strings with substring "110" and has just 4 states.

Step 2: Flip all the accepting states to non-accepting and all the non-accepting states to accepting. This page at Old Dominion describes the technique quite succinctly. This creates a DFA that accepts the language you want.

Step 3: Now you need to convert the DFA back to a regular expression. As @Alejandro Sazo mentioned there is a proof that this is possible. But you need an algorithm. This paper by Christoph Neumann describes three different techniques. The easiest one is the "state removal technique." You remove a state and replace all the edges between states that were connected to the removed state with edges labeled with regular expressions. As you reduce in this way you eventually get to a regular expression for the whole DFA. (Note that Figure 4 in the paper I linked is not quite right. The self-loop on state $q_j$ should have the regular expression "c e* b", not "c e* d".)

By going through this process it became clear that: the language you are going to need to solve for this particular problem has the property that once you have seen the sub-string "11" then you can never see another "0".

No comments

Powered by Blogger.