Michael Wen's Areas of Interest in Programming

Michael Wen 溫泰皓After I took courses in several areas in computer science, I'd say that I took a liking to the following areas:

I did several projects in each of the areas and what can I say? I just love them!

Those are real-world problems and millions and millions of people around the globe are working on solving them as we speak. Although it gives people the impression that the Internet has grown to its zenith, I'd say it's still in its infancy. We are not even close to maximizing the Internet's utility, let alone experiencing the full fruit it brings us. And I'd love to become one of those who make people's lives easier by solving some of the problems. Go ahead and knock yourself out.

Distributed Computing
Simply put, a distributed system is composed of multiple hosts who communicate across the Internet to do something useful for us Homo Sapiens. That's right. It is our slaves, but it is far from being as smart as the dumbest human slave. If everything worked perfectly at all times and nothing ever failed, distributed computing would be freed of one of its biggest challenges. However the world doesn't work this way: 'If anything can go wrong, it will (Murphy's Law).'

A host can fail at any moment; a network link can fail at any moment; a bug can manifest itself at any moment. This is one of the main reasons that make distributed computing an ongoing research topic. Failures considered, how can consistency be maintained throughout every single machine in the system? Also, the Internet is an extremely dynamic world; hosts come and leave. Making sure the system is doing its work correctly despite possible failures and dynamic configuration is extremely difficult. Other issues include communication latency, model of memory access, scalability, and concurrency control. All in all, distributed computing is a great way of realizing the idea,

'The Internet is just a big computer,' but we are still far from making it a reality. A course in advanced distributed system, that in advanced transactional database management system, and that in Java-centric RMI have given me a solid background in this area both at a conceptual level and implementation level. In the database course, we wrote a Global Transaction Manager as an addition to Sleepycat's Berkeley DB. In Java-centric RMI, we used Java RMI to implement distributed and parallel systems that perform a variety of tasks. We read papers extensively and it is intriguing seeing people come up with different models to approach this problem. Our project, GoogleDoc, is built on distributed and parallel concepts. Feel free to take a look.

Parallel Computing
Parallel computing is also a huge research area. The idea is that multiple processors are used in solving a big problem so that it can be solved faster. Apparently if our current processor were fast enough, we probably wouldn't need parallel computing. Unfortunately the world is made to be challenging and interesting. When you think about it you may feel parallel computing is simple to implement because conceptually, all you do is split up a big problem into a bunch of smaller ones and solve them in parallel and compose them into the solution to the original problem. That's what I thought initially, but it turns out that it's only partially true.

It's true that many problems can be statically decomposed into many small ones which can be solved quickly, but it's also true that many problems cannot be decomposed this way. Many problems use algorithms that are what we call recursion-based and you don't know in advance how to split them up. A famous example is IBM's Deep Blue Supercomputer that beat world's chess champion, Garry Kasparov. The machine calculated all possible moves in parallel (thus gaining a huge speedup) and used several metrics to get the move with the highest value. The course in Java RMI has taught me a lot in distributed and parallel computing and I've read many papers on parallel system implementations, including JavaSpaces, JavaSymphony, Javalin, Cilk, Cx, Jicos, Jini, and Ibis. I just find it riveting. Again, our project, GoogleDoc, is built on distributed and parallel concepts. Feel free to take a look!

Database Management System
A database is just a file (or a set of many files) that stores data, and a database management system, or commonly known as DBMS, is a software application that interacts with databases and their users. Essentially you can read and write data to the database, and it seems so simple this way. Each transaction consists of some number of write operations and some number of read operations and it is done. Simple huh? Not quite.

If every transaction were to be executed sequentially, then it is simple as I said. However we are humans, and as such we are greedy! We want more than performing a simple sequence of transactions; we want them to be completed more quickly. That's right; we want SPEED. So somebody figured out that each transaction can be on its own thread, or better yet, on a remote host. When transactions are executed in parallel, they tend to finish more quickly than when they are executed in serial order. That's what we want. Unfortunately if we pay no attention to the order of the operations that are executed we are in danger of data inconsistency, a terrible disaster in a financial institution.

For example, transaction t1 wants to execute r1(x)w1(x) and t2 wants to execute w2(x)r2(x). If the execution order is t1t2 or t2t1 it is fine. However if they interleave like r1(x)w2(x)w1(x)r2(x), then this execution must be rolled back. As you can see t1 reads the value of x, then decides to write to it. t2 butts in and writes to x some value, and before t1 writes to x, x is a different value from what t1 observed last time. This is not supposed to happen. It's like, 'I see my balance is $100.00, so I deposit $50.00 to it to make it $150.00, but someone deposits $80.00 to it without my knowing, and in the end my account has $230.00.' Do you see my point? Checking transactions for data dependency, unfortunately, is complicated and easily confusing.

In addition, failures are a big problem for distributed database system. A sample scenario is that several databases are deployed in different geographical regions, and a user application needs to contact different databases for different data through the help of a system known as the global transaction manager. You can immediately see the problem that can occur: While a transaction is being executed some database may fail and come back up shortly. You need to make sure the transaction is aborted and rolled back to the original state and EVERY database that is involved in this transaction must process accordingly. In our GTM project we needed to consider several failures types and make sure none of them results in database inconsistency. Hard, huh? But we pulled it :)

My experience in this area is obtained in the course I took and the project I did in that course. Also in the projects I conducted with HSM, I had experience using an HSM to do field-level encryption of a database. I also know SQL.

Artificial Intelligence/Machine Learning
AI is a very common term and it generally refers to fusing intelligence in a computer so that it can make decisions based on some observation or input. The reason we want computers to be able to do that is because human is not getting cheaper or faster or more reliable. Harsh but true. If we can make computers to perform repetitive tasks that human workers do, then we can get rid of those costly workers and replace them with cheap (usually one-time cost), reliable, fast computers. Does that mean more and more jobs will disappear and more and more people will be scavenging through the street? Possibly.

Anyway, that's the goal. Isn't the growth of machine intelligence fascinating? People are theorizing as to whether it is possible that one day computer will be so smart that it will take over the world like the movies 'Matrix' and 'I, Robot'. Most people think it's impossible cuz computer merely follows instructions people give it. However I beg to differ. I believe one day technology will reach a level where every single cell (and interaction thereof) in human body can be simulated by a computer. We'll see.

I am so fascinated by AI that I wrote many interesting programs outside of class, most of which are puzzle solvers and games. Two of the board games I wrote, SlimeWar and Reversi, beat the hell out of human players :) I have been exposed to many topics, including Bayesian network, Neural network, Petri net, different search algorithms, and so on.

Computer Vision
Computer vision concerns the area where we want to make computer see. A example is that the computer in a car can drive the car without any human intervention, and is able to reach the destination safely. If you take a look around, you see many objects and recognize all of them by their names, properties, uses, etc. Making computer do the same thing is incredibly difficult. All computer sees is a bunch of numbers that describe the color of the scene (through a camera or something) and making sense of them is next to impossible. A computer can follow concrete, precise instructions, not instruction like 'Gimme a random number' or 'Tell me what you think about this babe.' I took a graduate course that taught me a lot about computer vision. I also single-handedly developed a program in Matlab that finds eyes of a person's face in an image.

ADVERTISING WITH US - I'd LOVE to advertise for you! Direct your requests to wentaihao at yahoo.com