Saturday, August 29, 2015

005 - Longest Palindromic Substring

Longest Palindromic Substring

Given a string S, find the longest palindromic substring in S. You may assume that the maximum length of S is 1000, and there exists one unique longest palindromic substring.

Hide Tags

String

Hide Similar Problems

(H) Shortest Palindrome (E) Palindrome Permutation

经典的题。
一开始写的是每个扫描每个字符,并以它为中心,左右展开扫描,复杂度 O(n*n),空间 O(1),结果 68 ms,还不错。
这个思路稍微需要注意的是,需要把奇偶长度的结果分开处理。
另外可以不用每次刷新 rst,只需要刷新 (ii-pos+1)这个其实位置就好。

然后写了一个 DP 的解法,也是 O(n*n)。写完了去看别人的帖子,才发现无意之间写了个比较另类的 DP……
看到的帖子上的 DP 都是把字符串直接放在两个邻边,然后去判断:
    dp[i, j] = 1                                   if i == j     同一个
             = s[i] == s[j]                        if j = i + 1  相邻
             = s[i] == s[j] && dp[i + 1][j - 1]    if j > i + 1  互为mirror
【补】重新回头琢磨了一下帖子上的这个算法,跟自己的DP比较了一下,在另外的帖子里。

我写的时候是把其中一个反过来的,这样情况就更简单一些,只要两者相等,就继承左上再加一,否则为0
不过反过来的那一边,到字符串里取值的时候,要把下标换算一下(从 jj 到 kk)。
这样在移动的过程中, ii 是向所在串的尾部移动,jj(kk) 则是向头部移动
这个过程中,可能会出现一段字符串跟另一段字符串正好逆序对上(最后的大集合里有)的情况,
但它们俩并不是同一段落,所以这种情况并不是回文,过滤它的方法就是通过 ii 和 kk 的关系:
mtx[ii][jj] 的格子里,存储的就是这段回文的长度,所以,计算 ii-mtx[ii][jj]+1,结果应该就是这段回文的开头位置,也就是 kk
这样就能准确地滤掉那些不是回文的遥相呼应的段落了。
比较惊讶的是,跑出来是 460 ms! 耗时相当大,应该跟频繁地数组操作,以及冗余的遥相呼应的情况有关吧。。。

除了这两个思路,还有一个极其经典的算法:Manacher’s Algorithm,线性的复杂度!!
它的基本思想首先是利用插入的额外字符,把奇偶回文的情况统统变成了奇数长度的回文,
然后利用已经发现的回文,来辅助该回文覆盖范围内的后续(右边)的点去判断自己的回文范围。

网上对这个算法的讲解很多,有不少讲得比较清楚的:
        Manacher's ALGORITHMManacher算法学习与总结Longest Palindromic Substring Part II 
知乎上还专门有复杂度的证明:如何证明Manacher算法的时间复杂度是O(n)?

不过这些讲解里头,有几个地方还是比较绕的:
一个是下标,很多人都用 2*id–i 来表示 i 以 id 为中心点向左映射的点 i‘, 看了一会儿才明白,是 id–(i-id)
另一个是分析的层次:
首先是 i 在不在已经 explored 的范围内。不在 –> 重新scan,在的话,则要细分情况。
    对于在范围内的情况,分为三个情形:
         映射点的回文长度小于自己到explored的最右端的距离
              此时就等于映射点的范围,不存在更长的可能(最右端之前的字母是已知的回文!)。
         映射点的回文长度等于自己到explored的最右端的距离,实际上是映射点的回文正好跟中心点的回文起点(最左端)重合,
              此时虽然也是取映射点的范围,但是仍然存在向右(最右端之外)再看还能让回文延伸的可能性,所以需要继续 scan
         映射点的回文长度大于自己到explored的最右端的距离
             此时因为最右端之外还不知道是啥,所以只可能取自己到最右端的距离,并且需要继续 scan

借用一下别人博客的图片:
           

           
按照这个判断,写了一下实现,并且做了精简,跑出来的结果是。。。 4 ms!太猛了。。。

class Solution {
public:
    //------------------------------------- scan from each character -------------------------------------
    string longestPalindrome_scan(string s) {
        int len=s.length(), ll=0;
        string rst="";
        if(len)
        {
            for(int ii=0; ii<len; ++ii)
            {
                int len1=0, len2=0, len3=0, len4=0, jj=ii+1;
                while(len>len1+ii && 0<=ii-len1 && (s[ii-len1] == s[ii+len1]))        //for odd result
                        len2 += 1+(0<len1), len1++;

                if(ii<len-1 && s[ii] == s[jj])
                {
                    while(len>len3+jj && 0<=ii-len3 && (s[ii-len3] == s[jj+len3]))    //for even result
                        len4 += 2, len3++;
                }

                int longest = max(len2, len4), pos = (longest==len2) ? len1 : len3;
                if(ll<longest)
                    ll=longest, rst=s.substr(ii-pos+1, longest);
            }
        }
        return rst;
    }

    //--------------------------------------- DP with extra matrix ---------------------------------------
    string longestPalindrome_dp(string s) {
        int len=s.length(), ll=0, mtx[len][len];
        string rst="";
        for(int ii=0; ii<len; ++ii)
        {
            for(int jj=0; jj<len; ++jj)
            {
                int kk = len-1-jj;
                mtx[ii][jj] = (s[ii]==s[kk]) ? (1+((!ii||!jj) ? 0 : mtx[ii-1][jj-1])) : 0;

                if(ii-mtx[ii][jj]+1==kk && ll<=mtx[ii][jj])
                    ll=mtx[ii][jj], rst=s.substr(kk, ll);   //can save (ii-pos+1) instead of refresh rst
            }
        }
        return rst;
    }
   
    //--------------------------------------- Manacher’s Algorithm ---------------------------------------
    string longestPalindrome(string s) {
        int len=s.length();
        if(!len)
            return s;
       
        //Preprocessing of character to avoid even palindrome cases
        string buf = "$#";                                          //unique character as beginning
        for(int ii=0; ii<len; ++ii)
            buf += s[ii], buf += '#';                               //character as gapping
        buf += "&";                                                 //unique character as ending

        //linear calculation to find longest palindrome substring
        int lenb=buf.size(), mxx=0, pos=0;                          //mxx, pos are for final answer
        int lp[lenb], center=0, r=0;
        for(int ii=0; ii<lenb-1; ++ii)                              //no worries of invalid access
        {
            //center+r-1 is the fastest right range explored;
            //if ii is out of this right range, then need scan from ii, so set lp[ii]=0;
            //if ii is in this range, take smaller between:
            //      - ii's left mirror based on center: lp[center-(ii-center)]=lp[center+center-ii];
            //      - and the distacne from ii to the right range explored: (center+r-1) - ii
            //  here if they are equal, lp[ii] might be longer with unexplored characters,
            //  if not, no necessary to explore more for lp[ii]

            lp[ii] = (center+r-1>ii) ? min(lp[center-(ii-center)], (center+r-1) -ii) : 0;

            //we can check if(!lp[ii] || (center+r-1-ii)==lp[ii]) for cases need scanning,
            //but not necessarily since for unnecessary cases it won't step into while cycle.

            while(buf[ii+lp[ii]]==buf[ii-lp[ii]])                   //scanning
                lp[ii]++;

            if(ii+lp[ii]-1 > center+r-1)                            //exploring new right range
            {
                center=ii, r=lp[ii];                                //update new center and radius

                if(mxx<r)
                    mxx=r, pos=ii;                                  //update longest and its center

            }
        }

        //build answer
        return s.substr((pos-mxx)/2, mxx-1);                        //==(pos/2-1-(mxx/2-1), mxx-1);
    }
};
68 ms, 460 ms, 4 ms.

精简之后的代码逻辑上简练但是不便于理解,精简之前的代码逻辑上更为清晰:

    string longestPalindrome(string s) {
        int len=s.length();
        if(!len)
            return s;
       
        //Preprocessing of character to avoid even palindrome cases
        string buf = "$#";                                          //unique character as beginning
        for(int ii=0; ii<len; ++ii)
            buf += s[ii], buf += '#';                               //character as gapping
        buf += "&";                                                 //unique character as ending

        //linear calculation to find longest palindrome substring
        int lenb=buf.size(), mxx=0, pos=0;
        int lp[lenb], center=0, r=0;
        for(int ii=0; ii<lenb-1; ++ii)                              //no worries of invalid access
        {
            if(center+r-1>ii)                                       //ii is in explored range
            {
                int right_range=center+r-1, ii2rr=right_range-ii;   //distance from ii to right range
                int offset = ii-center, im = center - offset;       //mirror node of ii based on center
                lp[ii] = min(lp[im], ii2rr);                        //may need scan (not all cases)
            }
            else                                                    //ii is farther than right range
                lp[ii] = 0;                                         //will scan for ii later

            if(!lp[ii] || (center+r-1-ii)==lp[ii])                  //cases need scanning
            {
                while(buf[ii+lp[ii]]==buf[ii-lp[ii]])               //scanning
                    lp[ii]++;
            }  

            if(ii+lp[ii]-1 > center+r-1)                            //exploring new right range
            {
                center=ii, r=lp[ii];                                //update new center and radius

                if(mxx<r)
                    mxx=r, pos=ii;                                  //update longest and its center
            }
        }

        //build answer
        return s.substr(pos/2-1-(mxx/2-1), mxx-1);
    }

这里 lp[ii] 里保存的时候,如果只有自己,长度是1,如果扩展到#,或者下一个字母,依次累加,
所以计算最右端的范围的时候,都要减去 1。
最终要把中心点的坐标,换算回原来的字符串下标,最大长度因为是包含中心点以及一侧的字母和#,
所以换算回原来的字符串里生成结果的时候,也需要减去 1(因为总会多包含一个#)。

No comments:

Post a Comment