Longest Palindromic Substring

Given a string S, find the longest palindromic substring in S. You may assume that the maximum length of S is 1000, and there exists one unique longest palindromic substring.

Hide Tags

String

Hide Similar Problems

(H) Shortest Palindrome (E) Palindrome Permutation

经典的题。
一开始写的是每个扫描每个字符，并以它为中心，左右展开扫描，复杂度 O(n*n)，空间 O(1)，结果 68 ms，还不错。
这个思路稍微需要注意的是，需要把奇偶长度的结果分开处理。
另外可以不用每次刷新 rst，只需要刷新（ii-pos+1）这个其实位置就好。

然后写了一个 DP 的解法，也是 O(n*n)。写完了去看别人的帖子，才发现无意之间写了个比较另类的 DP……
看到的帖子上的 DP 都是把字符串直接放在两个邻边，然后去判断：
    dp[i, j] = 1                                   if i == j     同一个
             = s[i] == s[j]                        if j = i + 1 相邻
             = s[i] == s[j] && dp[i + 1][j - 1]    if j > i + 1 互为mirror
【补】重新回头琢磨了一下帖子上的这个算法，跟自己的DP比较了一下，在另外的帖子里。

我写的时候是把其中一个反过来的，这样情况就更简单一些，只要两者相等，就继承左上再加一，否则为0。
不过反过来的那一边，到字符串里取值的时候，要把下标换算一下（从 jj 到 kk）。
这样在移动的过程中， ii 是向所在串的尾部移动，jj（kk）则是向头部移动。
这个过程中，可能会出现一段字符串跟另一段字符串正好逆序对上（最后的大集合里有）的情况，
但它们俩并不是同一段落，所以这种情况并不是回文，过滤它的方法就是通过 ii 和 kk 的关系：
mtx[ii][jj] 的格子里，存储的就是这段回文的长度，所以，计算 ii-mtx[ii][jj]+1，结果应该就是这段回文的开头位置，也就是 kk
这样就能准确地滤掉那些不是回文的遥相呼应的段落了。
比较惊讶的是，跑出来是 460 ms! 耗时相当大，应该跟频繁地数组操作，以及冗余的遥相呼应的情况有关吧。。。

除了这两个思路，还有一个极其经典的算法：Manacher’s Algorithm，线性的复杂度！！
它的基本思想首先是利用插入的额外字符，把奇偶回文的情况统统变成了奇数长度的回文，
然后利用已经发现的回文，来辅助该回文覆盖范围内的后续（右边）的点去判断自己的回文范围。

网上对这个算法的讲解很多，有不少讲得比较清楚的：
        Manacher's ALGORITHM， Manacher算法学习与总结，Longest Palindromic Substring Part II
知乎上还专门有复杂度的证明：如何证明Manacher算法的时间复杂度是O(n)?

不过这些讲解里头，有几个地方还是比较绕的：
一个是下标，很多人都用 2*id–i 来表示 i 以 id 为中心点向左映射的点 i‘，看了一会儿才明白，是 id–(i-id)
另一个是分析的层次：
首先是 i 在不在已经 explored 的范围内。不在 –> 重新scan，在的话，则要细分情况。
    对于在范围内的情况，分为三个情形：
         映射点的回文长度小于自己到explored的最右端的距离，
              此时就等于映射点的范围，不存在更长的可能（最右端之前的字母是已知的回文！）。
         映射点的回文长度等于自己到explored的最右端的距离，实际上是映射点的回文正好跟中心点的回文起点（最左端）重合，
              此时虽然也是取映射点的范围，但是仍然存在向右（最右端之外）再看还能让回文延伸的可能性，所以需要继续 scan。
         映射点的回文长度大于自己到explored的最右端的距离，
             此时因为最右端之外还不知道是啥，所以只可能取自己到最右端的距离，并且需要继续 scan。

借用一下别人博客的图片：

按照这个判断，写了一下实现，并且做了精简，跑出来的结果是。。。 4 ms！太猛了。。。

class Solution {
public:
    //------------------------------------- scan from each character -------------------------------------
    string longestPalindrome_scan(string s) {
        int len=s.length(), ll=0;
        string rst="";
        if(len)
        {
            for(int ii=0; ii<len; ++ii)
            {
                int len1=0, len2=0, len3=0, len4=0, jj=ii+1;
                while(len>len1+ii && 0<=ii-len1 && (s[ii-len1] == s[ii+len1]))        //for odd result
                        len2 += 1+(0<len1), len1++;

                if(ii<len-1 && s[ii] == s[jj])
                {
                    while(len>len3+jj && 0<=ii-len3 && (s[ii-len3] == s[jj+len3]))    //for even result
                        len4 += 2, len3++;
                }

                int longest = max(len2, len4), pos = (longest==len2) ? len1 : len3;
                if(ll<longest)
                    ll=longest, rst=s.substr(ii-pos+1, longest);
            }
        }
        return rst;
    }

    //--------------------------------------- DP with extra matrix ---------------------------------------
    string longestPalindrome_dp(string s) {
        int len=s.length(), ll=0, mtx[len][len];
        string rst="";
        for(int ii=0; ii<len; ++ii)
        {
            for(int jj=0; jj<len; ++jj)
            {
                int kk = len-1-jj;
                mtx[ii][jj] = (s[ii]==s[kk]) ? (1+((!ii||!jj) ? 0 : mtx[ii-1][jj-1])) : 0;

                if(ii-mtx[ii][jj]+1==kk && ll<=mtx[ii][jj])
                    ll=mtx[ii][jj], rst=s.substr(kk, ll);   //can save (ii-pos+1) instead of refresh rst
            }
        }
        return rst;
    }

    //--------------------------------------- Manacher’s Algorithm ---------------------------------------
    string longestPalindrome(string s) {
        int len=s.length();
        if(!len)
            return s;

        //Preprocessing of character to avoid even palindrome cases
        string buf = "$#";                                          //unique character as beginning
        for(int ii=0; ii<len; ++ii)
            buf += s[ii], buf += '#';                               //character as gapping
        buf += "&";                                                 //unique character as ending

        //linear calculation to find longest palindrome substring
        int lenb=buf.size(), mxx=0, pos=0;                          //mxx, pos are for final answer
        int lp[lenb], center=0, r=0;
        for(int ii=0; ii<lenb-1; ++ii)                              //no worries of invalid access
        {
            //center+r-1 is the fastest right range explored;
            //if ii is out of this right range, then need scan from ii, so set lp[ii]=0;
            //if ii is in this range, take smaller between:
            //      - ii's left mirror based on center: lp[center-(ii-center)]=lp[center+center-ii];
            //      - and the distacne from ii to the right range explored: (center+r-1) - ii
            // here if they are equal, lp[ii] might be longer with unexplored characters,
            // if not, no necessary to explore more for lp[ii]
            lp[ii] = (center+r-1>ii) ? min(lp[center-(ii-center)], (center+r-1) -ii) : 0;

            //we can check if(!lp[ii] || (center+r-1-ii)==lp[ii]) for cases need scanning,
            //but not necessarily since for unnecessary cases it won't step into while cycle.
            while(buf[ii+lp[ii]]==buf[ii-lp[ii]])                   //scanning
                lp[ii]++;

            if(ii+lp[ii]-1 > center+r-1)                            //exploring new right range
            {
                center=ii, r=lp[ii];                                //update new center and radius

                if(mxx<r)
                    mxx=r, pos=ii;                                  //update longest and its center
            }
        }

        //build answer
        return s.substr((pos-mxx)/2, mxx-1);                        //==(pos/2-1-(mxx/2-1), mxx-1);
    }
};
68 ms, 460 ms, 4 ms.

精简之后的代码逻辑上简练但是不便于理解，精简之前的代码逻辑上更为清晰：

    string longestPalindrome(string s) {
        int len=s.length();
        if(!len)
            return s;

        //Preprocessing of character to avoid even palindrome cases
        string buf = "$#";                                          //unique character as beginning
        for(int ii=0; ii<len; ++ii)
            buf += s[ii], buf += '#';                               //character as gapping
        buf += "&";                                                 //unique character as ending

        //linear calculation to find longest palindrome substring
        int lenb=buf.size(), mxx=0, pos=0;
        int lp[lenb], center=0, r=0;
        for(int ii=0; ii<lenb-1; ++ii)                              //no worries of invalid access
        {
            if(center+r-1>ii)                                       //ii is in explored range
            {
                int right_range=center+r-1, ii2rr=right_range-ii;   //distance from ii to right range
                int offset = ii-center, im = center - offset;       //mirror node of ii based on center
                lp[ii] = min(lp[im], ii2rr);                        //may need scan (not all cases)
            }
            else                                                    //ii is farther than right range
                lp[ii] = 0;                                         //will scan for ii later

            if(!lp[ii] || (center+r-1-ii)==lp[ii])                  //cases need scanning
            {
                while(buf[ii+lp[ii]]==buf[ii-lp[ii]])               //scanning
                    lp[ii]++;
            }

            if(ii+lp[ii]-1 > center+r-1)                            //exploring new right range
            {
                center=ii, r=lp[ii];                                //update new center and radius

                if(mxx<r)
                    mxx=r, pos=ii;                                  //update longest and its center
            }
        }

        //build answer
        return s.substr(pos/2-1-(mxx/2-1), mxx-1);
    }

这里 lp[ii] 里保存的时候，如果只有自己，长度是1，如果扩展到#，或者下一个字母，依次累加，
所以计算最右端的范围的时候，都要减去 1。
最终要把中心点的坐标，换算回原来的字符串下标，最大长度因为是包含中心点以及一侧的字母和#，
所以换算回原来的字符串里生成结果的时候，也需要减去 1（因为总会多包含一个#）。

Dive, Dave !

Pages

Saturday, August 29, 2015

005 - Longest Palindromic Substring

Longest Palindromic Substring

No comments:

Post a Comment

BU

CS.BU

Scientific American Content: Global