Alright, so the basic problem is you have some system with a behavior
described by a differential equation, but you either know that no
classical (smooth) solution exists, or you think it might but aren't sure.
For example, say you need 2nd derivatives somewhere but you know the
problem must have C^2 discontinuities elsewhere because you have
constraints that enforce them! Another example is you have
poles/singularities but don't know where. (If you know in advance that
you're gonna have poles in a specific location, you can exclude that
location from consideration by removing it from your domain; but if you
only know there might be poles but you don't know where, what are you gonna do?)
Basically, you have reason to suspect that the solution to your equation might
not be "nice" (C^n smoothly differentiable everywhere etc.). Classical theory
doesn't have much to say about such problems.
Enter weak solutions. Essentially, you relax *all* the requirements considerably,
and replace them with much weaker ones (hence the name). For example, you replace
something strong like C^1(X) (the set of all continuously differentiable and say
real-valued functions on the set X) with something much weaker like W^{1,2}(X)
(the Sobolev space of square-integrable functions with one weak derivative).
What does this mean? Well, it gets technical quick. But basically, functions in L^p
or Sobolev spaces can be pretty ugly. They can have holes in their domain (as long
as the holes are not "big", i.e. have non-zero measure), they can be discontinuous
(very much so), the works.
In fact, they're so messy, if you're given some element f of say L^2(X), you can't
even evaluate it at some point x to get f(x); that's because the elements of L^2(X)
aren't even functions, they're sets (equivalence classes) of "similar" functions (in
a way I won't make precise here). The one thing you *can* safely do with elements
of L^p(X) is to calculate integrals with them. For example, say X=[-1,1] for
simplicity, and we want to know what f(0) is. We can't just ask, because the f we got
back isn't even a function and might be weird. But we can try to determine the average
value in some neighborhood of 0 by integrating:
1/(2e) * (integral(x=-e .. e) f(x) dx)
and then as we let e->0, we hope that this expression should converge to a meaningful
value for f(0). Essentially, we take our weird, spikey and possibly riddled with holes
function and sweep some sandpaper over it. Large values of e are coarse sandpaper;
they smooth out even very large bumps but destroy a lot of the features. Small values of
e are fine sandpaper; just enough to take the edge off the worst jumps, but hopefully
still preserving the overall appearance. And with our limit of letting e->0, we're in
effect using finer and finer sandpaper ad infinitum.
It turns out that this type of process actually works just fine for L^p functions. And
more generally, we don't want to use our ugly box window kernel, but something nicer;
the actual form to get a value, let's say "f~(x)" (limited ASCII, imagine the tilde over
f) for f(x) is
(*) f~(x) = integral(y in X) f(y) phi(x-y) dy
where phi is a suitable "test function". Note that this is just a convolution of f with
phi. A proper test function is smooth (normally C^infinity, because why not), has compact
support (that is, is 0 outside of a bounded interval) and integrates to 1. You can think
of it as a mathematically nice blur kernel that actually falls of to 0 after a finite
distance (so a Gaussian won't work). And usually they come in families with some parameter
that allows us to do the "shrinking radius" trick we pulled with the basic box filter.
Okay, so we can't evaluate f directly, but we can convolve it with a test function to get
what's a reasonable notion of that function's value at any particular point. And in
particular we get a f~ out that approximates f (up to some smoothing that we control) but
is very nice indeed (since it inherits all the C^infinity smoothness from our test function).
Progress, but why would we bother with all this mess?
Well, it turns out this trick has legs. We can use integration by parts (not bothering with
the details yet again) to show that we can get not just a value for f~ out of (*), but also
for its derivative!
f~'(x) = integral(y in X) f(y) (-phi'(x-y)) dy
Same general shape of equation, we're just convolving with a different kernel now. A
standard seed test function phi looks like a "smooth bump". The (-phi') we have here is
its negated first derivative; these functions look like a smooth bump down, followed by a
smooth bump up. It's a smooth, continuous analog of the finite differencing operation
f'(x) =~ (f(x+h) - f(x-h)) / 2h
with the "bump down" corresponding to the f(x-h) and the "bump up" corresponding to the
f(x+h) term. And you can keep pulling this trick as often as you want - if our phi is
C^infinity, that means we get as many derivatives for f as we want out of this!
The derivatives computed this way are called "weak derivatives", and they exist and
are well-defined even where classical derivatives don't.
If we take a classical differential equation, and we rewrite it in terms of integrals
with test functions (and their derivatives), we get what's called the "weak formulation".
Solutions to boundary value problems phrased this way are called "weak solutions", and
they're much more general than "classical" solutions. (Although there are cases where
one first shows that there exists a weak solution to a problem, and then goes on to show
that the weak solution is actually a classical solution.)
Why bother with all this? Fundamentally, because weak solutions are much easier to work
with. For classical solutions, you're dealing with spaces like C^1(X), the space of
continuously differentiable functions on X. The individual elements of C^1(X) are nice;
but it's easy to give sequences of elements in C^1(X), say
f_e(x) := sqrt(x^2 + e)
such that f_e is in C^1(X) for all e>0, but the limit for e->0 (f_0(x) = |x|), since it's
not a continuously differentiable function, isn't. (Again not bothering with details about
what exactly I mean with convergence here.) In mathematical terms, C^1(X) is not complete.
This is quite the bummer, since replacing a complicated function with successively finer,
simpler approximations is one of the standard tricks for solving complicated math problems.
Not being able to do this makes C^1(X) a pain to work with. The individual elements may be
nice, but the space as a whole leaves something to be desired.
L^p and Sobolev spaces are essentially the opposite. The individual elements can be messy as
hell, but the *spaces* are great. They're complete (so our sequence example would work),
and they have all kinds of nice properties that allow us to use approximation, smoothing etc.
with lots of freedom to solve our problem, without having to worry about painting ourselves
in a corner.