
After I took courses in several areas in computer science, I'd say that I
took a liking to the following areas:
I did several projects in each of the areas and
what can I say? I just love them!
Those are real-world problems and millions and millions of people around the
globe are working on solving them as we speak. Although it gives people the
impression that the Internet has grown to its zenith, I'd say it's still in its
infancy. We are not even close to maximizing the Internet's utility, let alone experiencing
the full fruit it brings us. And I'd love to become one of those who make
people's lives easier by solving some of the problems. Go ahead and knock
yourself out.
Simply put, a distributed system is composed of multiple
hosts who communicate across the Internet to do something useful for us Homo
Sapiens. That's right. It is our slaves, but it is far from being as smart as
the dumbest human slave. If everything worked perfectly at all times and nothing ever
failed, distributed computing would be freed of one of its biggest
challenges. However the world doesn't work this way: 'If anything can go
wrong, it will (Murphy's Law).'
A host can fail at any moment; a network link can fail at any
moment; a bug can manifest itself at any moment. This is one of the main reasons that
make distributed computing an ongoing research topic. Failures considered, how
can consistency be maintained throughout every single machine in the system?
Also, the Internet is an extremely dynamic world; hosts come and leave. Making
sure the system is doing its work correctly despite possible failures and
dynamic configuration is extremely difficult. Other issues include communication
latency, model of memory access, scalability, and concurrency control.
All in all, distributed computing is a great way of realizing the idea,
'The Internet is just a big computer,' but we are still far from
making it a reality.
A course in advanced distributed system, that in advanced transactional
database management system, and that in Java-centric RMI have given me a solid
background in this area both at a conceptual level and implementation level. In
the database course, we wrote a
Global Transaction Manager as an addition to
Sleepycat's Berkeley DB. In Java-centric RMI, we used Java RMI to implement
distributed and parallel systems that perform a variety of tasks. We read papers
extensively and it is intriguing seeing people come up with different models to
approach this problem. Our project,
GoogleDoc,
is built on distributed and parallel concepts. Feel free to take a look.
Parallel computing is also a huge research area. The idea is that multiple
processors are used in solving a big problem so that it can be solved faster.
Apparently if our current processor were fast enough, we probably wouldn't need
parallel computing. Unfortunately the world is made to be challenging and
interesting. When you think about it you may feel parallel computing is simple
to implement because conceptually, all you do is split up a big problem into a
bunch of smaller ones and solve them in parallel and compose them into the
solution to the original problem. That's what I thought initially, but it turns
out that it's only partially true.
It's true that many problems can be statically
decomposed into many small ones which can be solved quickly, but it's also true
that many problems cannot be decomposed this way. Many problems use algorithms
that are what we call recursion-based and you don't know in advance how to split
them up. A famous example is IBM's Deep Blue
Supercomputer that beat world's chess champion, Garry Kasparov. The machine
calculated all possible moves in parallel (thus gaining a huge speedup) and used
several metrics to get the move with the highest value.
The course in Java RMI has taught me a lot in distributed and parallel computing and I've
read many papers on parallel system implementations, including JavaSpaces,
JavaSymphony, Javalin, Cilk, Cx, Jicos, Jini, and Ibis. I just find it riveting. Again,
our project,
GoogleDoc, is
built on distributed and parallel concepts. Feel free to take a look!
A database is just a file (or a set of many files) that stores data, and a database management
system, or commonly known as DBMS, is a software application that interacts with
databases and their users. Essentially you can read and write data to the
database, and it seems so simple this way. Each transaction consists of some
number of write operations and some number of read operations and it is done.
Simple huh? Not quite.
If every transaction were to be executed sequentially,
then it is simple as I said. However we are humans, and as such we are greedy!
We want more than performing a simple sequence of transactions; we want them to
be completed more quickly. That's right; we want SPEED. So somebody figured out
that each transaction can be on its own thread, or better yet, on a remote host.
When transactions are executed in parallel, they tend to finish more quickly
than when they are executed in serial order. That's what we want. Unfortunately
if we pay no attention to the order of the operations that are executed we are
in danger of data inconsistency, a terrible disaster in a financial institution.
For example, transaction t1 wants to execute r1(x)w1(x) and t2 wants to execute
w2(x)r2(x). If the execution order is t1t2 or t2t1 it is fine. However if they
interleave like r1(x)w2(x)w1(x)r2(x), then this execution must be rolled back. As you can see
t1 reads the value of x, then decides to write to it. t2 butts in and writes to
x some value, and before t1 writes to x, x is a different value from what t1
observed last time. This is not supposed to happen. It's like, 'I see my
balance is $100.00, so I deposit $50.00 to it to make it $150.00, but someone
deposits $80.00 to it without my knowing, and in the end my account has
$230.00.' Do you see my point? Checking transactions for data dependency,
unfortunately, is complicated and easily confusing.
In addition, failures are a big problem for distributed database system. A
sample scenario is that several databases are deployed in different geographical
regions, and a user application needs to contact different databases for
different data through the help of a system known as the global transaction
manager. You can immediately see the problem that can occur: While a transaction
is being executed some database may fail and come back up shortly. You need to
make sure the transaction is aborted and rolled back to the original state and
EVERY database that is involved in this transaction must process accordingly. In
our
GTM project we needed to consider several
failures types and make sure none of them results in database inconsistency.
Hard, huh? But we pulled it :)
My experience in this area is obtained in the course I took and the project I
did in that course. Also in the
projects I conducted with HSM,
I had experience using an HSM to do field-level encryption of a
database. I also know SQL.
AI is a very common term and it generally refers to fusing intelligence in a
computer so that it can make decisions based on some observation or input. The
reason we want computers to be able to do that is because
human is not getting
cheaper or faster or more reliable. Harsh but true. If we can make computers to
perform repetitive tasks that human workers do, then we can get rid of those
costly workers and replace them with cheap (usually one-time cost), reliable,
fast computers. Does that mean more and more jobs will disappear and more and
more people will be scavenging through the street? Possibly.
Anyway, that's the goal. Isn't the growth of machine intelligence
fascinating? People are theorizing as to whether it is possible that one day
computer will be so smart that it will take over the world like the movies
'Matrix' and 'I, Robot'. Most people think it's impossible
cuz computer merely follows instructions people give it. However I beg to
differ. I believe one day technology will reach a level where every single cell
(and interaction thereof) in human body can be simulated by a computer. We'll
see.
I am so fascinated by AI that I wrote many interesting programs outside of
class, most of which are puzzle solvers and games. Two
of the board games I wrote,
SlimeWar
and
Reversi, beat the hell out of human players :)
I have been exposed to many topics, including Bayesian network, Neural network,
Petri net, different search algorithms, and so on.
Computer vision concerns the area where we want to make computer see. A
example is that the computer in a car can drive the car without any human
intervention, and is able to reach the destination safely. If you take a look
around, you see many objects and recognize all of them by their names,
properties, uses, etc. Making computer do the same thing is incredibly
difficult. All computer sees is a bunch of numbers that describe the color of
the scene (through a camera or something) and making sense of them is next to
impossible. A computer can follow concrete, precise instructions, not
instruction like 'Gimme a random number' or 'Tell me what you
think about this babe.'
I took a graduate course that taught me a lot about computer vision. I also
single-handedly developed a
program in Matlab
that finds eyes of a person's face in an image.