Dive, Dave !: 044 - Wildcard Matching

Monday, September 7, 2015

044 - Wildcard Matching

Implement wildcard pattern matching with support for '?' and '*'.

'?' Matches any single character.
'*' Matches any sequence of characters (including the empty sequence).

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "*") → true
isMatch("aa", "a*") → true
isMatch("ab", "?*") → true
isMatch("aab", "c*a*b") → false

Hide Tags
Dynamic Programming Backtracking Greedy String
Hide Similar Problems
(H) Regular Expression Matching
一开始没什么头绪，尝试 DP 却一直没有想好星号怎么处理。（想来想去都没想到加一行一列表示各自为空的情况）
参考了一下网上的帖子，写了一个 DP，大集合有个 4090+ 长度的字符串，断然是 stack overflow 了。
改用 2 个数组滚动之后，可以通过，但是效果并不理想，要 364 ms 之多。

实际上在 DP 之前，还做了一个解法，思路是参考另一个网上的帖子来的：
当出现星号的时候，记住 p 和 s 里各自的位置，用星号后面的字母来跟 s 当前位置继续比较，此时星号等于空串；
如果相同则继续；如果比到不同的，则回头，把 s 的当前位置向右移动一位，也就相当于星号多替代了一个字符。
但若是字符不同，且不是？，且没有星号，则返回 false。
这样持续“比较/回溯”，直到 s 处理完，比较结束，如果 p 还有剩余的星号则持续略过，
最后看是否 p 也走到头了，是的话，就说明是个完全匹配了。

一开始写的时候，自以为是地把结束符也作为普通字符，放在 while 循环里一起处理了，看起来代码简洁一些，
虽然能够通过，但性能只是 120 ms，并不理想。DP 的性能出来之后就更纳闷了……
而后看到讨论区里基本一样的逻辑，跑出来几乎 1/6 的耗时，试着把结束符的处理从 while 循环里拿掉，
再加上第二个 while 去处理尾巴上有更多字符的情况，直接就到了 16 ms！！
看来主要的原因是尾巴上的字符要逐一去跟 s 的结束符比较，过多地做了不必要的比较判断导致额外的耗时。

class Solution {
public:
    bool isMatch_dp(string s, string p){
        int slen=s.length()+1, plen=p.length()+1;

        //bool dp[slen][plen]; //stack overflow for huge long string
        bool xx=true, dp[2][plen];                          //actually we need only 2 rows
        for(int x=0; x<slen; ++x)
        {
            xx = !xx;
            dp[xx][0] = (!x);                               //initialization as if p is ""
            for(int y=1; y<plen; ++y)
            {
                if(!x)
                    dp[0][y] = ('*'==p[y-1] && dp[0][y-1]);
                else
                {
                    if( ((s[x-1]==p[y-1] || '?'==p[y-1]) && dp[!xx][y-1])           //cases except '*'
                        ||(('*' == p[y-1]) && (dp[!xx][y]||dp[!xx][y-1]||dp[xx][y-1])) )    //case '*'                        dp[xx][y] = true;
                    else
                        dp[xx][y] = false;
                }
            }
        }
        return dp[xx][plen-1];
    }

    bool isMatch(string s, string p) {
        int sidx=0, pidx=0, spos=-1, ppos=-1, slen=s.length(), plen=p.length();

        while(sidx<slen)
        {
            if((s[sidx] == p[pidx]) || ('?' == p[pidx]))
                sidx++, pidx++;

            //mark position if there is a '*'
            else if('*' == p[pidx])
                spos=sidx, ppos=pidx++; //try to match s[sidx] with next character of p[pidx] ('*')

            //if arrives here, means mismatch, need check if there is a '*' apprears previously
            //if yes, then go back and re-check for characters after '*' by shift one character in s
            else if(-1!=ppos)
            {
                pidx = ppos+1;          //in p, go back to the character after '*'
                sidx = (++spos);        //in s, shift one character for re-checking
            }

            //if no, return false
            else
                return false;
        }

        //case that p has extra '*' while we've hit '\0' of s
        while('*'==p[pidx])
            pidx++;

        return (pidx==plen);
    }
};364 ms, 16 ms.

Dive, Dave !

Pages

Monday, September 7, 2015

044 - Wildcard Matching

No comments:

Post a Comment

BU

CS.BU

Scientific American Content: Global