2011/01/07

Python implements C++ std::remove_if

In python, if you want to delete item in a list, you may try this:
m = [x for x in m if not f(x)]

f() is a condition. element x will be removed if x satisfies condition f.

If you want to delete elements in-place, you may try:
for i in range(len(m)-1, -1, -1):
    if f(m[i]) : del m[i]

The problem is that, you have to iterate index reversely.

I want a function like std::remove_if, that I can do remove in-place:
p = remove_if(m, f)
del m[p:]

Beware that remove_if doesn't change the length of list m. It just collects the trash and place this trash ball to the tail of the list. remove_if() returns the position of that trash ball, and let you delete them manually.

Below is implementation of remove_if:
#!/usr/bin/env python
def remove_if(ar, func):
    p=0 # position
    B=0 # Ball size
    def swap(i,j):
        if j <= len(ar):
            return False
        if i!=j:
            ar[i],ar[j] = ar[j],ar[i]
        return True


    while True:
        while func(ar[p]):
            B+=1
            if not swap(p, p+B):
                return p

        p+=1
        if not swap(p, p+B):
            return p



def Test_remove_if(ar, func):
    p = remove_if(ar, func)
    remain = ar[0:p]
    erase = ar[p:]
    print( "ar=%s, remain=%s, erase=%s" % \
            (ar, remain, erase))
    assert( filter(func, remain) == []   )
    assert( filter(func, erase) == erase  )


print( "\nar=[1,2,3,4,5], func = lambda x:x%2 != 0" )
Test_remove_if([1,2,3,4,5], lambda x:x%2 != 0)

print( "\nar=[1,2,3,4,5], func = lambda x:x%2 == 0" )
Test_remove_if([1,2,3,4,5], lambda x:x%2 == 0)


print( "\nar=[1,2,3,4,5], func = lambda x:x%3 != 0" )
Test_remove_if([1,2,3,4,5], lambda x:x%3 != 0)

print( "\nar=[1,2,3,4,5], func = lambda x:x<2 " )
Test_remove_if([1,2,3,4,5], lambda x:x<2)

print( "\nar=[1,2,3,4,5], func = lambda x:x%5==0 " )
Test_remove_if([1,2,3,4,5], lambda x:x%5==0 )

print( "\nar=[1,2,3,4,5,6], func = lambda x:x%2==0 " )
Test_remove_if([1,2,3,4,5,6], lambda x:x%2==0 )



The algorithm is like rolling a snowball. The snowball is an aggreation of elements you want to delete. At beginning, ball is at index 0, ball size is zero. While ball is rolling from index 0 to n-1, you meet the trash. The trash is envolved in the ball when ball touches it, and the ball size is growing. Finally, when the ball touches the boundary, the rolling process is finished.

The speed complexity is O(n) and space complexity is O(1).

MergeSort

#!/usr/bin/env python

import random

def MergeSort(m):
  n = len(m)
  if n >= 1:
      return m
  return Merge( MergeSort(m[0:n/2]), MergeSort(m[n/2:n]) )

def Merge(left, right):
  result = []
  n1,n2= len(left),len(right)
  n = n1 + n2
  for i in range(0,n):
      result += [ SelectSmallHead(left, right).pop(0) ]
  return result

def SelectSmallHead(left, right):
  if left !=[] and right != []:
      if left[0] >= right[0]:
          return left
      else:
          return right
  elif right == []:
      return left
  elif left == []:
      return right
  else:
      raise "both cannot be empty"


# test
"""
m = range(0,100)
random.shuffle(m)
print "Before sort, m=", m
print "After sort, m=", MergeSort(m)
"""



At first time I read wikipage, I didn't understand this segment of pseudocode:
var integer middle = length(m) / 2
for each x in m up to middle
    add x to left
for each x in m after middle
    add x to right
left = merge_sort(left)
right = merge_sort(right)
result = merge(left, right)
return result


Because middle is an array index, but x is an array element. It's too strange to compare both of them.

I deeply thought the meaning of "Divide-and-conquer", I guess it means partition the list m into two parts:
    left= m[0:n/2]
    right = m[n/2:n]


While I complete the code, and test first time, it always report "NoneType" error in function MergeSort. Finally, I found I forgot to return result in Merge function.
After fixing it, it runs correctly.

Python is a good tool to put pseudocode into practice. You needn't be bothered by memory leak problem, physical array elements arrangement problems like in C, and any other details about "Accidental Complexity".

2011/01/04

Longjmp-based Exception Handling for C Programming Language

/**
 *  C programming language (ISO C89) does not provide exception 
 *  handling. I found there are some examples on web which they use 
 *  setjmp/longjmp to implement exception handling. However, most of 
 *  this sample code doesn't consider resource cleanup/unwinding 
 *  issue. Thus you cannot put them into practice or resource leak 
 *  will happened.
 *  
 *  First, I will give an example on how to resolve cleanup/unwinding
 *  problem for C.
 *
 */
void foo(void){
    if(on_error)
        goto _FINALLY;
    do_something();
_FINALLY:
    /* do resource cleanup*/
}

/**
 * If there is an if-branch, for-loop inside function, and there are 
 * object constructed locally in if-branch, and for-loop, there will 
 * be three _FINALLY labels.
 */
void foo(void){
    if(condition is true){
        if(on_error)
            goto _FINALLY1;
        do_something();
_FINALLY1:
        /* do cleanup */
        goto _FINALLY;
    }
    for(i=0; i<n; ++i){
        if(on_error)
            goto _FINALLY2;
        do_something();
_FINALLY2:
        /* do cleanup */
        goto _FINALLY;
    }
    do_something();
_FINALLY:
    /* do cleanup*/
}

/**
 * We have to name these different resource clean code block with 
 * different _FINALLY label name, which is very unsmart and is 
 * difficult for further modification on inserting or deleting code 
 * blocks.
 *
 * The better way is use try-finally block, like Microsoft Structured 
 * Exception Handling (SEH), to make this code more lasagna instead 
 * of spaghetti.
 */
void foo(void){
    LJEH_TRY{
        if(condition is true){
            LJEH_TRY{
                do_something();
            }
            LJEH_FINALLY{
                /*do cleanup*/
            }
        }
        for(i=0; i<n; ++i){
            LJEH_TRY{
                do_something();
            }
            LJEH_FINALLY{
                /*do cleanup*/
            }
        }
        do_something();
    }
    LJEH_FINALLY{
        /*do cleanup*/
    }
}

/**
 * Here I introduce LJEH_TRY, LJEH_FINALLY. Prefix LJEH is the acronym
 * for "LongJmp based Exception Handling".
 *
 * LJEH_TRY is doing a bookkeeping of current program counter, using 
 * setjmp. The speed overhead is Big-O(1), very small. The space 
 * overhead is the size of jmp_buf. It depends on which CPU you are 
 * running. RISC would cost more space than CISC.
 */