Monday, August 31, 2015

132 - Palindrome Partitioning II

Palindrome Partitioning II

Given a string s, partition s such that every substring of the partition is a palindrome.

Return the minimum cuts needed for a palindrome partitioning of s.

For example, given s = "aab",
Return 1 since the palindrome partitioning ["aa","b"] could be produced using 1 cut.

Hide Tags

Dynamic Programming

Hide Similar Problems

(M) Palindrome Partitioning

照猫画虎把 Palindrome Partitioning 的代码拿来改了改提交断然是超时…，看来提示 DP 是有原因的。
没怎么想出 DP 怎么递推，只好去查帖子。
虽然讲这道题的帖子不少，但好多都是聊聊数语一笔带过，看来看去总觉得理解不是很透彻。

后来遇到这个帖子，基本上理解清楚了：
核心的思想是用一个一维数组来记录从开始到每个字符所需要切的最小刀数，初始化为字符的下标（也就是最大可能的刀数）
然后利用二维矩阵来判断子串 s[substr_start, substr_end] 是不是一个回文（下图蓝色的两个格子之间，含它们），
如果是，那么就可以在 substr_start 之前切一刀（红色的线），
而在子串 s[substr_start, substr_end] 中就不用再切了，这样就能减少切的刀数，
因此在 substr_end 的位置，最小刀数是在（substr_start-1 里存的刀数 + 1）与自己已经存了的刀数之间取更小的那个。
如果不是，则无需处理。
这样当通过 substr_start, substr_end 的移动检测过所有的子串之后，从开始到每个字符的最小刀数也就都有了，
最后的结果就是一位数组里最后的一个值了。

在写代码的时候，跟原帖稍有不同的是，它填的是二维矩阵的上半部分，我填的是下半部分，依赖的方向不一样
具体的不同之处在之前的 005 - Longest Palindromic Substring 的 DP 再分析里分析过了，一个是垂直填，一个是水平填。

另外就是二维数组一开始用的 int 型的，但是当字符串长度大于一定值的时候，这个数组的大小就超过了默认栈的大小，
直接导致栈溢出，runtime error，（有个 case 有 1462个字符，算算都超过 8M 了）改成 bool 型之后就没有问题了。

class Solution {
public:
    int minCut(string s) {
        int len=s.length(), p1=0, p2=len-1;
        if(!len || 1==len)              //special cases: empty/single character
            return 0;

        bool mtx[len][len];             //"int" array might be bigger than the stack size for long strings
        int min_cut[len];              //store minimal cut number for substr s[0 ~ current character]
        for(int substr_end=0; substr_end<len; ++substr_end)
        {
            //pre-set max cut number for each substr end node
            min_cut[substr_end] = substr_end;

            //check each substr start node from 0~substr_end for possible smallest cut number
            for(int substr_start=0; substr_start<=substr_end; ++substr_start)
            {
                //initialize as substr is not palindrome
                mtx[substr_end][substr_start] = false;

                //only update min_cut if substr s[substr_start, substr_end] is palindrome:
                if(s[substr_start]==s[substr_end] &&                    //characters are equal
                    (1>=substr_end-substr_start ||                      //same or adjacent character
                        mtx[substr_end-1][substr_start+1]))             //other cases: rely on Right-Up
                {
                    //mark as is palindrome for future relying
                    mtx[substr_end][substr_start] = true;

                    //if substr s[0~substr_end] is palindrome we don't need cut it.
                    if(!substr_start)
                        min_cut[substr_end] = 0;
                    //if need one more cut for substr s[substr_start~substr_end],
                    //then add 1 to previous smallest cut number before substr s[substr_start~substr_end]
                    //which is the number stored in min_cut[substr_start-1]
                    else
                        min_cut[substr_end] = min(min_cut[substr_end], min_cut[substr_start-1]+1);
                }
            }
        }
        return min_cut[len-1];
    }
};
24 ms.

005 - Longest Palindromic Substring 的 DP 再分析

之前的帖子提到了看到的帖子都是通过如下公式来做 DP 的：
    dp[i, j] = 1                                   if i == j     同一个
             = s[i] == s[j]                        if j = i + 1 相邻
             = s[i] == s[j] && dp[i + 1][j - 1]    if j > i + 1 互为mirror

我自己的方法则不同，是把其中一个字符串反向了，换算坐标来匹配字符的，
匹配到了（相同字母）的格子则继承左上的值加一，（不同于上面的状态方程是依赖右下的值来置 1）
然后通过计算横纵坐标之差是不是就是填入的数字来得到正确的结果。

仔细再分析了一下它们的差异，也对这个状态方程理解得更清楚了一些。
对于前两种情况，比价容易理解，单独一个字符，或者两个字符相等且相邻，自然都构成回文。

对于第三种情况，实际上是，以对角线上的某个节点作为中心点，沿着垂直于对角线的方向（也就是反对角线）向两端延伸，
因此，可以如这个公式那样，写成（当 i，j 对应的字母相等时）一个格子的值，依赖于其左下的格子，
也可以写成（当 i，j 对应的字母相等时）一个格子的值，依赖于其右上的格子。
只不过前一种情况，是向右上延伸，后一种情况是向左下延伸而已。

而按照这里的状态方程，只需要处理 j > =i 的情况，或者 j <= i 的情况即可，也就是说，只需要计算半个矩阵就够了，
这点在实际跑的结果中也可以得到印证：我的 DP 跑出来是 460 ms，而处理半个矩阵的结果是 300 ms，少了三分之一。

另外，计算的过程也要注意：
按照上面的状态方程，后一个依赖的左下，也就是说，左下的值必须在当前格子之前计算出来，否则就错了。
所以，沿着这个方程，只能是在上半区纵向处理数据，这样才能保证被依赖的格子先计算出来，依赖的后计算出来。
而依赖的起点，要么是两个字符相邻，要么是两个下标指向同一个字符。

这样就有了下面的代码（来自一个博客）：

class Solution {
public:
    string longestPalindrome(string s) {
        int dp[s.size()][s.size()] = {0}, left = 0, right = 0, len = 0;
        for (int i = 0; i < s.size(); ++i) {
            for (int j = 0; j < i; ++j) {
                dp[j][i] = (s[i] == s[j] && (i - j < 2 || dp[j + 1][i - 1]));
                if (dp[j][i] && len < i - j + 1) {
                    len = i - j + 1;
                    left = j;
                    right = i;
                }
            }
            dp[i][i] = 1;
        }
        return s.substr(left, right - left + 1);
    }
};

以字符串 abcddcba 为例，它的执行结果：

1 0 0 0 0 0 0 1
1 0 0 0 0 1 0
    1 0 0 1 0 0
      1 1 0 0 0
        1 0 0 0
          1 0 0
            1 0
              1

而事实上，也完全可以把它写成对右上的依赖，这样计算的过程就不需要一列一列地来了，而是按照习惯的一行一行的来就好了：

string longestPalindrome2(string s) {
    int dp[s.size()][s.size()]={0}, left = 0, right = 0, len = 0;
    for (int i = 0; i < s.size(); ++i) {
        for (int j = 0; j < i; ++j) {
            dp[i][j] = (s[i] == s[j] && (i - j < 2 || dp[i - 1][j + 1])); //rely on R-U
            if (dp[i][j] && len < i - j + 1) {
                len = i - j + 1;
                left = j;
                right = i;
            }
        }
        dp[i][i] = 1;
    }
    return s.substr(left, right - left + 1);
}

它的执行结果：

1
0 1
0 0 1
0 0 0 1
0 0 0 1 1
0 0 1 0 0 1
0 1 0 0 0 0 1
1 0 0 0 0 0 0 1

现在在回头去看自己的 DP，其实也是可以通过只处理一般，以及说明依赖关系。来减少冗余的计算的，
只不过因为横向的字符串反向了，计算的结果将出现在正向的对角线上，而不是上面的反向的对角线上。

好了，说到这里算是差不多说清楚了。

Sunday, August 30, 2015

131 - Palindrome Partitioning

Palindrome Partitioning

Given a string s, partition s such that every substring of the partition is a palindrome.
Return all possible palindrome partitioning of s.
For example, given s = "aab",
Return

  [
    ["aa","b"],
    ["a","a","b"]
  ]

Hide Tags
Backtracking
Hide Similar Problems
(H) Palindrome Partitioning II (M) Word Break

当然还是DFS，逐个取子串检查，如果该子串是回文，则进入递归，遇到结尾如果还是则 push 结果。
刚开始跑一直都是20 ms，仔细查看了一下，是先取子串，然后丢到单独的函数里检查是否为回文，再做后面的操作的，
但实际上还可以更快，那就是在原始字符串 s 里通过下标检测指定范围是否是回文，是的话再截取子串做处理。
改过之后果然直接 12 ms，40%的提升呐！

class Solution {
public:
    vector<vector<string>> partition(string s) {
        vector<vector<string>> rst;
        vector<string> buf;
        dfs(s, 0, s.length(), buf, rst);
        return rst;
    }

    void dfs(string& s, int pos, int len, vector<string>& buf, vector<vector<string>>& rst)
    {
        for(int ii=pos; ii<len; ++ii)
        {
            int p1=pos, p2=ii;
            while(p1<p2 && (s[p1]==s[p2]))             //check palindrome in original string
                p1++, p2—;
            if(p1<p2)                                  //if substring is not a palindrome
                continue;

            string ss = s.substr(pos, ii-pos+1);
            buf.push_back(ss);
            if(ii==len-1)
                rst.push_back(buf);
            else
                dfs(s, ii+1, len, buf, rst);
            buf.pop_back();
        }
    }
};
12 ms.

Saturday, August 29, 2015

214 - Shortest Palindrome

Shortest Palindrome

Given a string S, you are allowed to convert it to a palindrome by adding characters in front of it. Find and return the shortest palindrome you can find by performing this transformation.

For example:

Given "aacecaaa", return "aaacecaaa".

Given "abcd", return "dcbabcd".

Credits:
Special thanks to @ifanchu for adding this problem and creating all test cases. Thanks to @Freezen for additional test cases.

Hide Tags

String

Hide Similar Problems

(M) Longest Palindromic Substring (E) Implement strStr()

仔细分析一下会发现，问题的关键，在于找到从 s[0] 开始的最长回文串，
找到了之后把剩下的部分reverse一下，加在前面，就是答案。

既然是找最大回文，自然想到之前的 Longest Palindromic Substring 里用的“马拉车”Manacher 算法。
只需要在每次更新 center 和半径的时候，检查转换成 s 里的起点是否还是 0 就好了，一旦不是，就直接break；
这样就找到了从 s[0] 开始的最大回文串，后面就easy了。按照这个思路写了，8 ms，很不错。
后来在另一处看到：“Manacher算法的特性，id == p[id]时回文一定是从s[0]开始的”，试了一下，用这个判断也可以。

查了一下，不少提到利用 KMP 算法的帖子，想想也是，字符串查找么，KMP 是个不错的思路。
看了一下利用 KMP 的代码，很简洁，一时之间没看懂，要复习一下 KMP 的实现再看估计好些。
从这个帖子进去，还发现一篇“从头到尾彻底理解KMP（2014年8月22日版）”对 KMP 讲得非常仔细，回头一起学习下。

class Solution {
public:
    string shortestPalindrome(string s) {
        int len=s.length();
        if(!len || 1==len)
            return s;

        //Preprocessing of character to avoid even palindrome cases
        string buf = "$#";                                          //unique character as beginning
        for(int ii=0; ii<len; ++ii)
            buf += s[ii], buf += '#';                               //character as gapping
        buf += "&";                                                 //unique character as ending

while(buf[ii+lp[ii]]==buf[ii-lp[ii]]) //scanning
lp[ii]++;

            if(ii+lp[ii]-1 > rr)                                    //exploring new right range
            {
                center=ii, r=lp[ii];                                //update new center and radius

                if(ii==lp[ii] && mxx<r)                             //only check starting from s[0]
                {
                    //if(0<(ii-r)/2)                                //exit if beginning is not s[0]
                    //    break;
                    mxx=r, pos=ii;                                  //update longest and its center
                }
            }
        }

        int plen=mxx-1;                                             //length of palindrome from s[0]
        string addme = (plen==len) ? "" : s.substr(plen, len-plen); //get the rest part of s
        reverse(addme.begin(), addme.end());                        //reverse the rest part
        return addme+s;                                             //return combined string
    }
};
8 ms.

005 - Longest Palindromic Substring

Longest Palindromic Substring

Given a string S, find the longest palindromic substring in S. You may assume that the maximum length of S is 1000, and there exists one unique longest palindromic substring.

Hide Tags

String

Hide Similar Problems

(H) Shortest Palindrome (E) Palindrome Permutation

经典的题。
一开始写的是每个扫描每个字符，并以它为中心，左右展开扫描，复杂度 O(n*n)，空间 O(1)，结果 68 ms，还不错。
这个思路稍微需要注意的是，需要把奇偶长度的结果分开处理。
另外可以不用每次刷新 rst，只需要刷新（ii-pos+1）这个其实位置就好。

然后写了一个 DP 的解法，也是 O(n*n)。写完了去看别人的帖子，才发现无意之间写了个比较另类的 DP……
看到的帖子上的 DP 都是把字符串直接放在两个邻边，然后去判断：
    dp[i, j] = 1                                   if i == j     同一个
             = s[i] == s[j]                        if j = i + 1 相邻
             = s[i] == s[j] && dp[i + 1][j - 1]    if j > i + 1 互为mirror
【补】重新回头琢磨了一下帖子上的这个算法，跟自己的DP比较了一下，在另外的帖子里。

我写的时候是把其中一个反过来的，这样情况就更简单一些，只要两者相等，就继承左上再加一，否则为0。
不过反过来的那一边，到字符串里取值的时候，要把下标换算一下（从 jj 到 kk）。
这样在移动的过程中， ii 是向所在串的尾部移动，jj（kk）则是向头部移动。
这个过程中，可能会出现一段字符串跟另一段字符串正好逆序对上（最后的大集合里有）的情况，
但它们俩并不是同一段落，所以这种情况并不是回文，过滤它的方法就是通过 ii 和 kk 的关系：
mtx[ii][jj] 的格子里，存储的就是这段回文的长度，所以，计算 ii-mtx[ii][jj]+1，结果应该就是这段回文的开头位置，也就是 kk
这样就能准确地滤掉那些不是回文的遥相呼应的段落了。
比较惊讶的是，跑出来是 460 ms! 耗时相当大，应该跟频繁地数组操作，以及冗余的遥相呼应的情况有关吧。。。

除了这两个思路，还有一个极其经典的算法：Manacher’s Algorithm，线性的复杂度！！
它的基本思想首先是利用插入的额外字符，把奇偶回文的情况统统变成了奇数长度的回文，
然后利用已经发现的回文，来辅助该回文覆盖范围内的后续（右边）的点去判断自己的回文范围。

网上对这个算法的讲解很多，有不少讲得比较清楚的：
        Manacher's ALGORITHM， Manacher算法学习与总结，Longest Palindromic Substring Part II
知乎上还专门有复杂度的证明：如何证明Manacher算法的时间复杂度是O(n)?

不过这些讲解里头，有几个地方还是比较绕的：
一个是下标，很多人都用 2*id–i 来表示 i 以 id 为中心点向左映射的点 i‘，看了一会儿才明白，是 id–(i-id)
另一个是分析的层次：
首先是 i 在不在已经 explored 的范围内。不在 –> 重新scan，在的话，则要细分情况。
    对于在范围内的情况，分为三个情形：
         映射点的回文长度小于自己到explored的最右端的距离，
              此时就等于映射点的范围，不存在更长的可能（最右端之前的字母是已知的回文！）。
         映射点的回文长度等于自己到explored的最右端的距离，实际上是映射点的回文正好跟中心点的回文起点（最左端）重合，
              此时虽然也是取映射点的范围，但是仍然存在向右（最右端之外）再看还能让回文延伸的可能性，所以需要继续 scan。
         映射点的回文长度大于自己到explored的最右端的距离，
             此时因为最右端之外还不知道是啥，所以只可能取自己到最右端的距离，并且需要继续 scan。

借用一下别人博客的图片：

按照这个判断，写了一下实现，并且做了精简，跑出来的结果是。。。 4 ms！太猛了。。。

class Solution {
public:
    //------------------------------------- scan from each character -------------------------------------
    string longestPalindrome_scan(string s) {
        int len=s.length(), ll=0;
        string rst="";
        if(len)
        {
            for(int ii=0; ii<len; ++ii)
            {
                int len1=0, len2=0, len3=0, len4=0, jj=ii+1;
                while(len>len1+ii && 0<=ii-len1 && (s[ii-len1] == s[ii+len1]))        //for odd result
                        len2 += 1+(0<len1), len1++;

                if(ii<len-1 && s[ii] == s[jj])
                {
                    while(len>len3+jj && 0<=ii-len3 && (s[ii-len3] == s[jj+len3]))    //for even result
                        len4 += 2, len3++;
                }

                int longest = max(len2, len4), pos = (longest==len2) ? len1 : len3;
                if(ll<longest)
                    ll=longest, rst=s.substr(ii-pos+1, longest);
            }
        }
        return rst;
    }

    //--------------------------------------- DP with extra matrix ---------------------------------------
    string longestPalindrome_dp(string s) {
        int len=s.length(), ll=0, mtx[len][len];
        string rst="";
        for(int ii=0; ii<len; ++ii)
        {
            for(int jj=0; jj<len; ++jj)
            {
                int kk = len-1-jj;
                mtx[ii][jj] = (s[ii]==s[kk]) ? (1+((!ii||!jj) ? 0 : mtx[ii-1][jj-1])) : 0;

                if(ii-mtx[ii][jj]+1==kk && ll<=mtx[ii][jj])
                    ll=mtx[ii][jj], rst=s.substr(kk, ll);   //can save (ii-pos+1) instead of refresh rst
            }
        }
        return rst;
    }

    //--------------------------------------- Manacher’s Algorithm ---------------------------------------
    string longestPalindrome(string s) {
        int len=s.length();
        if(!len)
            return s;

        //Preprocessing of character to avoid even palindrome cases
        string buf = "$#";                                          //unique character as beginning
        for(int ii=0; ii<len; ++ii)
            buf += s[ii], buf += '#';                               //character as gapping
        buf += "&";                                                 //unique character as ending

        //linear calculation to find longest palindrome substring
        int lenb=buf.size(), mxx=0, pos=0;                          //mxx, pos are for final answer
        int lp[lenb], center=0, r=0;
        for(int ii=0; ii<lenb-1; ++ii)                              //no worries of invalid access
        {
            //center+r-1 is the fastest right range explored;
            //if ii is out of this right range, then need scan from ii, so set lp[ii]=0;
            //if ii is in this range, take smaller between:
            //      - ii's left mirror based on center: lp[center-(ii-center)]=lp[center+center-ii];
            //      - and the distacne from ii to the right range explored: (center+r-1) - ii
            // here if they are equal, lp[ii] might be longer with unexplored characters,
            // if not, no necessary to explore more for lp[ii]
            lp[ii] = (center+r-1>ii) ? min(lp[center-(ii-center)], (center+r-1) -ii) : 0;

            //we can check if(!lp[ii] || (center+r-1-ii)==lp[ii]) for cases need scanning,
            //but not necessarily since for unnecessary cases it won't step into while cycle.
            while(buf[ii+lp[ii]]==buf[ii-lp[ii]])                   //scanning
                lp[ii]++;

            if(ii+lp[ii]-1 > center+r-1)                            //exploring new right range
            {
                center=ii, r=lp[ii];                                //update new center and radius

                if(mxx<r)
                    mxx=r, pos=ii;                                  //update longest and its center
            }
        }

        //build answer
        return s.substr((pos-mxx)/2, mxx-1);                        //==(pos/2-1-(mxx/2-1), mxx-1);
    }
};
68 ms, 460 ms, 4 ms.

精简之后的代码逻辑上简练但是不便于理解，精简之前的代码逻辑上更为清晰：

    string longestPalindrome(string s) {
        int len=s.length();
        if(!len)
            return s;

        //Preprocessing of character to avoid even palindrome cases
        string buf = "$#";                                          //unique character as beginning
        for(int ii=0; ii<len; ++ii)
            buf += s[ii], buf += '#';                               //character as gapping
        buf += "&";                                                 //unique character as ending

        //linear calculation to find longest palindrome substring
        int lenb=buf.size(), mxx=0, pos=0;
        int lp[lenb], center=0, r=0;
        for(int ii=0; ii<lenb-1; ++ii)                              //no worries of invalid access
        {
            if(center+r-1>ii)                                       //ii is in explored range
            {
                int right_range=center+r-1, ii2rr=right_range-ii;   //distance from ii to right range
                int offset = ii-center, im = center - offset;       //mirror node of ii based on center
                lp[ii] = min(lp[im], ii2rr);                        //may need scan (not all cases)
            }
            else                                                    //ii is farther than right range
                lp[ii] = 0;                                         //will scan for ii later

            if(!lp[ii] || (center+r-1-ii)==lp[ii])                  //cases need scanning
            {
                while(buf[ii+lp[ii]]==buf[ii-lp[ii]])               //scanning
                    lp[ii]++;
            }

            if(ii+lp[ii]-1 > center+r-1)                            //exploring new right range
            {
                center=ii, r=lp[ii];                                //update new center and radius

                if(mxx<r)
                    mxx=r, pos=ii;                                  //update longest and its center
            }
        }

        //build answer
        return s.substr(pos/2-1-(mxx/2-1), mxx-1);
    }

这里 lp[ii] 里保存的时候，如果只有自己，长度是1，如果扩展到#，或者下一个字母，依次累加，
所以计算最右端的范围的时候，都要减去 1。
最终要把中心点的坐标，换算回原来的字符串下标，最大长度因为是包含中心点以及一侧的字母和#，
所以换算回原来的字符串里生成结果的时候，也需要减去 1（因为总会多包含一个#）。

Friday, August 28, 2015

128 - Longest Consecutive Sequence

Longest Consecutive Sequence

Given an unsorted array of integers, find the length of the longest consecutive elements sequence.

For example,
Given [100, 4, 200, 1, 3, 2],
The longest consecutive elements sequence is [1, 2, 3, 4]. Return its length: 4.

Your algorithm should run in O(n) complexity.

Hide Tags

Array

既然是 O(n)，又不知道数组的取值范围，那只能是哈希了，
把所有值过一遍，然后再从小到大检查连续性，同步更新最大的连续个数即可。
刚开始用map做了一次，28 ms，改用 set 之后，24 ms。

class Solution {
public:
    int longestConsecutive(vector<int>& nums) {
        set<int> buf;
        for(auto n : nums)
            buf.insert(n);

        int rst=0, consecutive=0, last = 0;
        for(auto b : buf)
        {
            b==last+1 ? consecutive++ : consecutive=1;
            rst = max(rst, consecutive);
            last=b;
        }
        return rst;
    }
};
24 ms.

072 - Edit Distance

Edit Distance

Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)

You have the following 3 operations permitted on a word:

a) Insert a character
b) Delete a character
c) Replace a character

Hide Tags

Dynamic Programming String

Hide Similar Problems

(M) One Edit Distance

眼界不够，看了wiki 才知道这个问题还是有来头的。
先说题目本身，解完了再说它的渊源。

把两个字符串分别放在 DP 的矩阵横向/纵向的两个邻边，
如果在某个格子（x，y）从两个字符串里取到的字符不同，则可能要进行三种操作：插入，删除，替换。
仔细观察可以知道，插入，删除，替换分别对应动态编程中的向右，向下，向右下：
    向下说明要在纵向的字符串里多跳过一个字符，然后再与横向字符串的下一个比较；
    向右则反之，说明要跳过横向字符串里的下一个字符，去跟下下一个字符比较；
        也就是说，实际上对一个字符串的删除，等效于对另一个字符串的插入。
    而向右下，则是直接把一个字符串中的字符替换成另一个中的字符。
    所以此时当前格子的值就是取这三个操作中最小的一个（就是要找最小的嘛），加上当前的操作cost（1）；
而如果取到字母相同，很显然无需额外操作，只需要继承之前的结果，也就是（x-1，y-1）里面的值就可以了。

由此可以得到如下解答：
构建 DP 矩阵 mtx，两字符串居于邻边，但从 1 开始，而不是 0，因为主要的计算在矩阵中间。
注意首行/列分别对应其中一个字符串为空的情形，也就说，步步都插入，所以distance就等于下标：

然后依照规则填写，（1,1）分别对应 b 和 s，所以是取 0 再加 1 = 1；（1,2）则是 1+1=2，（1,3）则是继承 2；……
最终将得到右侧的表格，最后一个格子的 6 就是需要的最小步骤了。

class Solution {
public:
    int minDistance(string word1, string word2) {
        int len1=word1.length()+1, len2=word2.length()+1;

        //use vector get 28 ms, use array get 16 ms.
        //vector<vector<int>> mtx(len1, vector<int>(len2, 0));
        int mtx[len1][len2];                    //will initialize later
        for(int x=0; x<len1; ++x)
        {
            mtx[x][0] = x;                      //initialization as if word2 is an empty string
            for(int y=1; y<len2; ++y)
            {
                if(word1[x-1]==word2[y-1])
                    mtx[x][y] = mtx[x-1][y-1];
                else
                    mtx[x][y] = (!x) ? y :      //initialization as if word1 is an empty string
                        //updating minimal distance needed of replace/insert/delete
                        (1 + min(mtx[x-1][y-1], min(mtx[x-1][y], mtx[x][y-1])));
            }
        }
        return mtx[len1-1][len2-1];
    }
};
16 ms.

这个问题的确是有来头的，而且在自然语言处理，拼写检测，生物识别等等领域都是相当重要的。（wiki）
这里的问题是简化过的，实际上还应该带上每个操作的权重，这样就更接近实际问题的处理了：
$\begin{align}d_{i0} &= \sum_{k=1}^{i} w_\mathrm{del}(b_{k}), & & \quad \text{for}\; 1 \leq i \leq m \\<br />d_{0j} &= \sum_{k=1}^{j} w_\mathrm{ins}(a_{k}), & & \quad \text{for}\; 1 \leq j \leq n \\<br />d_{ij} &= \begin{cases} d_{i-1, j-1} & \text{for}\; a_{j} = b_{i}\\ \min \begin{cases} d_{i-1, j} + w_\mathrm{del}(b_{i})\\ d_{i,j-1} + w_\mathrm{ins}(a_{j}) \\ d_{i-1,j-1} + w_\mathrm{sub}(a_{j}, b_{i}) \end{cases} & \text{for}\; a_{j} \neq b_{i}\end{cases} & & \quad \text{for}\; 1 \leq i \leq m, 1 \leq j \leq n.\end{align}$
关于它的算法，DP 有两种，一种是如上面的用二维矩阵，空间是 O(mn)，另一种是用两个一维数组来滚动，空间是 O(m+n)，
另外还有一种改进的Hirschberg's algorithm，空间只是 O(min{n,m})。

另一个有意思的是，如果把三个操作中的替换去掉，只能增减的话，它就是 LCS (Longest common subsequence) 了；
而如果只是允许替换的话，（对于长度相等的字符串），它实际上就成了Hamming distance。

还有一些其他的演化，例如：
实际的字符串检测中还有一个常见的场景：两个相邻字符位置反了，因此还可以有一个操作，叫做互换（transposition），
另外在 OCR 输出校对的时候，还有 merge 和 split 操作，做一对多的更为复杂的替换。

Thursday, August 27, 2015

174 - Dungeon Game

Dungeon Game

The demons had captured the princess (P) and imprisoned her in the bottom-right corner of a dungeon. The dungeon consists of M x N rooms laid out in a 2D grid. Our valiant knight (K) was initially positioned in the top-left room and must fight his way through the dungeon to rescue the princess.

The knight has an initial health point represented by a positive integer. If at any point his health point drops to 0 or below, he dies immediately.

Some of the rooms are guarded by demons, so the knight loses health (negative integers) upon entering these rooms; other rooms are either empty (0's) or contain magic orbs that increase the knight's health (positive integers).

In order to reach the princess as quickly as possible, the knight decides to move only rightward or downward in each step.

Write a function to determine the knight's minimum initial health so that he is able to rescue the princess.

For example, given the dungeon below, the initial health of the knight must be at least 7 if he follows the optimal path RIGHT-> RIGHT -> DOWN -> DOWN.

-2 (K)    -3        3
-5        -10       1
10        30      -5 (P)

Notes:

The knight's health has no upper bound.
Any room can contain threats or power-ups, even the first room the knight enters and the bottom-right room where the princess is imprisoned.

Credits:
Special thanks to @stellari for adding this problem and creating all test cases.

Hide Tags

Dynamic Programming Binary Search

Hide Similar Problems

(M) Unique Paths (M) Minimum Path Sum

典型DP。
既然是要到达终点还活着，那就从终点往前推算。
终点如果是正的，到达之前至少需要一滴血，
若是负的，就得要保证加上这个负的还剩一滴血，那就是至少有 1 - （该负数）滴血。
这样就得到了到达终点之前的两个点必须有多少滴血了，然后以此类推，即可一路走到起点。

对于其他的点，不在边上的，应该从右边和下边的两个点里选择最小的，减去自己的，即为到达是至少要有的血数
注意计算的结果如果是小于 1 的，就应该直接写上 1，不然就挂啦。
在边上的，可以假想边的外面有个 INT_MAX，跟同一边边上的后一个之间求最小（也就是直接取同一边的后面一个）
这样就可以把代码统一了。

另外好奇 hint 里有个二分查找，难道是要每次选一个 mid，代入到矩阵里，dfs 算一次到达剩余的血有多少吗。。。？
试过一下，退出的条件不能直接写到达P的时候剩余 1 或者 1-dungeon[sz-1][zz-1] （例如 [[0,5],[-2,-3]]），没有深究。

class Solution {
public:
    int calculateMinimumHP(vector<vector<int>>& dungeon) {
        if(!dungeon.size() || !dungeon[0].size())
            return 0;

        return dp(dungeon, dungeon.size(), dungeon[0].size());
    }

    int dp(vector<vector<int>>& mtx, int sz, int zz)
    {
        for(int ii=sz-1; ii>=0; ii--)
        {
            for(int jj=zz-1; jj>=0; jj--)
            {
                int atleast = 1;
                if(sz-1>ii || zz-1>jj)
                    atleast = min((sz-1>ii)?mtx[ii+1][jj]:INT_MAX, (zz-1>jj)?mtx[ii][jj+1]:INT_MAX);

                mtx[ii][jj] = max(1, atleast - mtx[ii][jj]);
            }
        }
        return mtx[0][0];
    }
};
16 ms.

037 - Sudoku Solver

Sudoku Solver

Write a program to solve a Sudoku puzzle by filling the empty cells.

Empty cells are indicated by the character '.'.

You may assume that there will be only one unique solution.

A sudoku puzzle... ...and its solution numbers marked in red.

Hide Tags

Backtracking Hash Table

Hide Similar Problems

(E) Valid Sudoku

本来以为是拼脑力的一道题，没想到拼的只是体力，无趣程度仅次于罗马字数字互转那两道题。

拿到题目首先想到的自然是回溯，
同时还想到的是，对每个空位，最好是能基于行，列，所在九宫列出可用的数字，这样便于减少循环嘛。
所以写了一个基于这个思路的 solution，在每个空格都计算available的数字，并从中选择然后递归。

递归的时候被一个破问题 block 了不少时间：
起初的做法是，先 [x][y+1] 递归进去，做完返回之后再 [x+1][y] 递归，就是这里坏事了。
从 [0][0] 开始，一路横着递归到 [0][8]，然后因为此时 y<8 不成立，所以一路纵向递归到 [8][8]，然后逐步返回。
返回到 [0][8]，此格的递归全部完成，因此再次返回，来到 [0][7]，
在 [0][7] 这里，是从 [x][y+1] 返回的，所以还需要做 [x+1][y]，于是进入第二部分递归，一路纵向直到 [7][8]。

问题就在这里，一旦在到 [7][8] 的途中出现 false，就会往回退，
但是！退到 [0][7] 之后，再回退是回退到 [0][6]，而不是已经填过数字的第 8 列！！
这就意味着，回退清除已经填充的数字的时候，右边已经填充的列们，是不会被清除的。。。！！
换句话说，回溯回不去了！！

在这里郁闷了好一阵子，四个方向 dfs 也不对，怎么才能回去呢？想来想去没有好的法子。
最后参考了网上的帖子里的做法，一行走到最右边，强制再从下一行的最左边开始……
是的，改成这样之后，问题解决了，除去下标运算没有减去 ‘1’ 的问题，计算出了正确答案，耗时 56 ms，不算好。
但是怎么看怎么觉得别扭，丑陋如斯！

class Solution {
public:
    //----------------------- calculate availabe numbers then recursive -----------------------
    void solveSudoku_dave(vector<vector<char>>& board) {
        if(9==board.size() && 9==board[0].size())
        {
            vector<vector<bool>> tried(9, vector<bool>(9));

            fill(board, 0, 0, tried);
        }
    }

    bool fill(vector<vector<char>>& board, int x, int y, vector<vector<bool>>& tried)
    {
        bool rst = true;
        if(9>x && 9>y)
        {
            vector<char> ava;
            char cc = board[x][y];
            if('.' == cc)
            {
                available(board, x, y, ava);
                if(!ava.size())
                    return false;
            }
            else
                ava.push_back(cc);

            for(auto a : ava)
            {
                tried[x][y] = true;
                board[x][y] = a;
                rst = true;

if(rst && y<8 && false==tried[x][y+1])
rst &= fill(board, x, y+1, tried);

if(8==y)
rst &= fill(board, x+1, 0, tried);

                //就是在这里给自己挖了个坑，并且只能用丑陋无比的回0来解决。
                //if(rst && x<8 && false==tried[x+1][y])
                //    rst &= fill(board, x+1, y, tried);

                if(false == rst)
                    board[x][y] = cc, tried[x][y]=false;
                else
                    break;
            }
        }
        return rst;
    }

    void available(vector<vector<char>>& board, int x, int y, vector<char>& ava)
    {
        int num[]={1,2,3,4,5,6,7,8,9};
        for(int ii=0; ii<9; ++ii)
        {
            int idx = board[x][ii]-'1', idy=board[ii][y]-'1';

            if('.' != board[x][ii])
                    num[idx] = -1;

            if('.' != board[ii][y])
                    num[idy] = -1;
        }

        int blk_xstart=(x)/3*3, blk_ystart=(y)/3*3, blk_xend=blk_xstart+3, blk_yend=blk_ystart+3;
        for(int ii=blk_xstart; ii<blk_xend; ++ii)
        {
            for(int jj=blk_ystart; jj<blk_yend; ++jj)
            {
                int idx = board[ii][jj]-'1';
                if(0<num[idx] && '.' != board[ii][jj])
                        num[idx] = -1;
            }
        }

        for(int ii=0; ii<9; ++ii)
            if(0<num[ii])
                ava.push_back((char)(num[ii]+'0'));
    }
};
56 ms.

网上的帖子里流行的做法，大多数是不管三七二十一，对空的格子直接 1 到 9 循环尝试，
每次填入一个值，在借鉴Valid Sudoku的思路，判定整个矩阵是否合乎规则，
有效的话，就递归去搞定下一个空格，不行就回头再重来，直到最后。
整个操作的过程，是在 for 循环里做的，应该不算是 DFS ，但倒可以算是“回溯”。。。
隐约觉得这个应该更加费时，参照一个帖子写了一个，果不其然，80 ms，
参照另一个帖子做了一点优化，每次递归不要从头开始查，而是接着已经走过的部分继续查，
有些提高，从 80 ms 到了 64 ms，没有更好了。

class Solution {
public:
    //-------------------- no available calculation, do 1~9 each time --------------------
    void solveSudoku2(vector<vector<char>>& board) {
        if(9==board.size() && 9==board[0].size())
            solving(board, 0, 0);
    }

    //没有优化，每次重头查
    bool solving(vector<vector<char>>& board)
    {
        for(int ii=0; ii<9; ++ii)
        {
            for(int jj=0; jj<9; ++jj)
            {
                if('.'==board[ii][jj])
                {
                    for(char cc='1'; cc<='9'; ++cc)
                    {
                        board[ii][jj] = cc;         //try all possible values
                        if(isValid(board, ii, jj)
                            && solving(board))      //recursive call, backtracking
                            return true;
                        board[ii][jj] = '.';        //recover old value when go back
                    }
                    return false;                   //all possible numbers are tried.
                }
            }
        }
        return true;                                //no more '.' to solve
    }

    //优化之后，从现有位置继续
    bool solving(vector<vector<char>>& board, int xx, int yy)
    {
        for(int ii=xx; ii<9; ++ii)
        {
            for(int jj=(0+(xx==ii)?yy:0); jj<9; ++jj)   //jj=yy, then =0 !!
            {
                if('.'==board[ii][jj])
                {
                    for(char cc='1'; cc<='9'; ++cc)
                    {
                        board[ii][jj] = cc;             //try all possible values
                        if(isValid(board, ii, jj)
                            //recursive call from next element, backtracking
                            && solving(board, ii+(8==jj), (8==jj)?0:(jj+1)))
                            return true;
                        board[ii][jj] = '.';            //recover old value when go back
                    }
                    return false;                       //all possible numbers are tried.
                }
            }
        }
        return true;                                    //no more '.' to solve
    }

    bool isValid(vector<vector<char>>& board, int x, int y)
    {
        for(int ii=0; ii<9; ++ii)
            if((y!=ii && board[x][ii]==board[x][y]) || (x!=ii && board[ii][y]==board[x][y]))
                return false;

        int blk_xstart=(x)/3*3, blk_ystart=(y)/3*3, blk_xend=blk_xstart+3, blk_yend=blk_ystart+3;
        for(int ii=blk_xstart; ii<blk_xend; ++ii)
            for(int jj=blk_ystart; jj<blk_yend; ++jj)
                if(ii!=x && jj!=y && board[ii][jj]==board[x][y])
                    return false;

return true;
}
};
64 ms.

看看性能有减无增，就跑去讨论区里看，本来以为会有什么精巧的算法，结果…，大失所望。
基本上一句话，空间换时间。
例如下面这个例子，弄三个全矩阵的 buffer，本别标记行/列/所在九宫的占用情况，占用即更新。
号称是 dfs，实际上看看那个 while，完全等于两重 for 循环，这也能叫 dfs…
另外那句“ board[row][col] = '.'; ”，原文是在 1~9 的 for 循环之后，功能上没问题，因为只判断 buffer么，
但是语义上不能算对，应该挪到 for 循环里头。

class Solution {
public:
    //----------------------------- full buffer for quick check -----------------------------
    void buildUsedFlag(vector<vector<char>> &board,
        bool rowUsed[9][9], bool colUsed[9][9], bool blkUsed[9][9])
    {
        for(auto i=0; i<9 ; ++i)
            for(auto j=0; j<9; ++j)
                if(board[i][j]!='.')
                    rowUsed[i][board[i][j]-'1'] = true,
                    colUsed[j][board[i][j]-'1'] = true,
                    blkUsed[(i/3) *3 +(j/3)][board[i][j]-'1'] =true ;
    }

    bool dfs(vector<vector<char>> &board, int row, int col,
        bool rowUsed[9][9], bool colUsed[9][9], bool blkUsed[9][9])
    {
        while(row < 9 && board[row][col] != '.')
        {
            row += (col+1)/9;
            col = (col+1) % 9;
        }

        if(row==9)
            return true;

        int i, blkIdx = (row/3) * 3 + col/3;
        for(i=0; i<9; i++)
        {
            if(rowUsed[row][i] || colUsed[col][i] || blkUsed[blkIdx][i])
                continue;

            rowUsed[row][i] = colUsed[col][i] = blkUsed[blkIdx][i] = true;
            board[row][col] = '1' + i;
            if(dfs(board, row, col, rowUsed, colUsed, blkUsed))
                return true;
            board[row][col] = '.';
            rowUsed[row][i] = colUsed[col][i] = blkUsed[blkIdx][i] = false;
        }
        return false;
    }

    void solveSudoku(vector<vector<char>>& board) {
        bool rowUsed[9][9], colUsed[9][9], blkUsed[9][9];
        fill_n(&rowUsed[0][0], 81, false);
        fill_n(&colUsed[0][0], 81, false);
        fill_n(&blkUsed[0][0], 81, false);

        buildUsedFlag(board, rowUsed, colUsed, blkUsed);

        dfs(board, 0, 0, rowUsed, colUsed, blkUsed);
    }
};
4 ms.

后来看到一个算是有点意思的解法：
其实也还算是空间换时间，但同时也通过限制条件来提前算好了每个空的格子的可能取值，
而且这个限制条件的矩阵是随着每次填入的值来动态更新的，这个挺有创意。（这一部分可以借鉴到自己的代码里）
另外就是对于每个元素的可能性进行排序，从而尽可能减少不必要的反复尝试，犀利。（这个真没想到）

class Solution { struct cell // encapsulates a single cell on a Sudoku board { uint8_t value; // cell value 1..9 or 0 if unset // number of possible (unconstrained) values for the cell uint8_t numPossibilities; // if bitset[v] is 1 then value can't be v bitset<10> constraints; cell() : value(0), numPossibilities(9),constraints() {}; }; array<array<cell,9>,9> cells; // sets the value of the cell to [v] // the function also propagates constraints to other cells and deduce new values where possible bool set(int i, int j, int v) { // updating state of the cell cell& c = cells[i][j]; if (c.value == v) return true; if (c.constraints[v]) return false; c.constraints = bitset<10>(0x3FE); // all 1s c.constraints.reset(v); c.numPossibilities = 1; c.value = v; // propagating constraints for (int k = 0; k<9; k++) { // to the row: if (i != k && !updateConstraints(k, j, v)) return false; // to the column: if (j != k && !updateConstraints(i, k, v)) return false; // to the 3x3 square: int ix = (i / 3) * 3 + k / 3; int jx = (j / 3) * 3 + k % 3; if (ix != i && jx != j && !updateConstraints(ix, jx, v)) return false; } return true; } // update constraints of the cell i,j by excluding possibility of 'excludedValue' // once there's one possibility left the function recurses back into set() bool updateConstraints(int i, int j, int excludedValue) { cell& c = cells[i][j]; if (c.constraints[excludedValue]) { return true; } if (c.value == excludedValue) { return false; } c.constraints.set(excludedValue); if (--c.numPossibilities > 1) return true; for (int v = 1; v <= 9; v++) { if (!c.constraints[v]) { return set(i, j, v); } } assert(false); } // backtracking state - list of empty cells vector<pair<int, int>> bt; // find values for empty cells bool findValuesForEmptyCells() { // collecting all empty cells bt.clear(); for (int i = 0; i < 9; i++) { for (int j = 0; j < 9; j++) { if (!cells[i][j].value) bt.push_back(make_pair(i, j)); } } // making backtracking efficient by pre-sorting empty cells by numPossibilities sort(bt.begin(), bt.end(), [this](const pair<int, int>&a, const pair<int, int>&b) { return cells[a.first][a.second].numPossibilities < cells[b.first][b.second].numPossibilities; }); return backtrack(0); } // Finds value for all empty cells with index >=k bool backtrack(int k) { if (k >= bt.size()) return true; int i = bt[k].first; int j = bt[k].second; // fast path - only 1 possibility if (cells[i][j].value) return backtrack(k + 1); auto constraints = cells[i][j].constraints; // slow path >1 possibility. // making snapshot of the state array<array<cell,9>,9> snapshot(cells); for (int v = 1; v <= 9; v++) { if (!constraints[v]) { if (set(i, j, v)) { if (backtrack(k + 1)) return true; } // restoring from snapshot, // note: computationally this is cheaper // than alternative implementation with undoing the changes cells = snapshot; } } return false; } public: void solveSudoku(vector<vector<char>> &board) { cells = array<array<cell,9>,9>(); // clear array // Decoding input board into the internal cell matrix. // As we do it - constraints are propagated and even additional values are set as we go // (in the case if it is possible to unambiguously deduce them). for (int i = 0; i < 9; i++) { for (int j = 0; j < 9; j++) { if (board[i][j] != '.' && !set(i, j, board[i][j] - '0')) return; // sudoku is either incorrect or unsolvable } } // if we're lucky we've already got a solution, // however, if we have empty cells we need to use backtracking to fill them if (!findValuesForEmptyCells()) return; // sudoku is unsolvable // copying the solution back to the board for (int i = 0; i < 9; i++) { for (int j = 0; j < 9; j++) { if (cells[i][j].value) board[i][j] = cells[i][j].value + '0'; } } } }; 作者说是2 ms，有人说跑的是 0 ms. 基本上这道题的核心是：空间换时间，1 ~ 9 暴力往上呼也不会差，从这个角度而言，没啥意思。如果在此基础上能够缩小每个空格备选的范围，会省时间，如果能再对可能性排序，那就算玩得有点儿意思了。

Dive, Dave !

Pages

Monday, August 31, 2015

132 - Palindrome Partitioning II

Palindrome Partitioning II

005 - Longest Palindromic Substring 的 DP 再分析

Sunday, August 30, 2015

131 - Palindrome Partitioning

Palindrome Partitioning

Saturday, August 29, 2015

214 - Shortest Palindrome

Shortest Palindrome

005 - Longest Palindromic Substring

Longest Palindromic Substring

Friday, August 28, 2015

128 - Longest Consecutive Sequence

Longest Consecutive Sequence

072 - Edit Distance

Edit Distance

Thursday, August 27, 2015

174 - Dungeon Game

Dungeon Game

037 - Sudoku Solver

Sudoku Solver

BU

CS.BU

Scientific American Content: Global