Alright, so the basic problem is you have some system with a behavior described by a differential equation, but you either know that no classical (smooth) solution exists, or you think it might but aren't sure. For example, say you need 2nd derivatives somewhere but you know the problem must have C^2 discontinuities elsewhere because you have constraints that enforce them! Another example is you have poles/singularities but don't know where. (If you know in advance that you're gonna have poles in a specific location, you can exclude that location from consideration by removing it from your domain; but if you only know there might be poles but you don't know where, what are you gonna do?) Basically, you have reason to suspect that the solution to your equation might not be "nice" (C^n smoothly differentiable everywhere etc.). Classical theory doesn't have much to say about such problems. Enter weak solutions. Essentially, you relax *all* the requirements considerably, and replace them with much weaker ones (hence the name). For example, you replace something strong like C^1(X) (the set of all continuously differentiable and say real-valued functions on the set X) with something much weaker like W^{1,2}(X) (the Sobolev space of square-integrable functions with one weak derivative). What does this mean? Well, it gets technical quick. But basically, functions in L^p or Sobolev spaces can be pretty ugly. They can have holes in their domain (as long as the holes are not "big", i.e. have non-zero measure), they can be discontinuous (very much so), the works. In fact, they're so messy, if you're given some element f of say L^2(X), you can't even evaluate it at some point x to get f(x); that's because the elements of L^2(X) aren't even functions, they're sets (equivalence classes) of "similar" functions (in a way I won't make precise here). The one thing you *can* safely do with elements of L^p(X) is to calculate integrals with them. For example, say X=[-1,1] for simplicity, and we want to know what f(0) is. We can't just ask, because the f we got back isn't even a function and might be weird. But we can try to determine the average value in some neighborhood of 0 by integrating: 1/(2e) * (integral(x=-e .. e) f(x) dx) and then as we let e->0, we hope that this expression should converge to a meaningful value for f(0). Essentially, we take our weird, spikey and possibly riddled with holes function and sweep some sandpaper over it. Large values of e are coarse sandpaper; they smooth out even very large bumps but destroy a lot of the features. Small values of e are fine sandpaper; just enough to take the edge off the worst jumps, but hopefully still preserving the overall appearance. And with our limit of letting e->0, we're in effect using finer and finer sandpaper ad infinitum. It turns out that this type of process actually works just fine for L^p functions. And more generally, we don't want to use our ugly box window kernel, but something nicer; the actual form to get a value, let's say "f~(x)" (limited ASCII, imagine the tilde over f) for f(x) is (*) f~(x) = integral(y in X) f(y) phi(x-y) dy where phi is a suitable "test function". Note that this is just a convolution of f with phi. A proper test function is smooth (normally C^infinity, because why not), has compact support (that is, is 0 outside of a bounded interval) and integrates to 1. You can think of it as a mathematically nice blur kernel that actually falls of to 0 after a finite distance (so a Gaussian won't work). And usually they come in families with some parameter that allows us to do the "shrinking radius" trick we pulled with the basic box filter. Okay, so we can't evaluate f directly, but we can convolve it with a test function to get what's a reasonable notion of that function's value at any particular point. And in particular we get a f~ out that approximates f (up to some smoothing that we control) but is very nice indeed (since it inherits all the C^infinity smoothness from our test function). Progress, but why would we bother with all this mess? Well, it turns out this trick has legs. We can use integration by parts (not bothering with the details yet again) to show that we can get not just a value for f~ out of (*), but also for its derivative! f~'(x) = integral(y in X) f(y) (-phi'(x-y)) dy Same general shape of equation, we're just convolving with a different kernel now. A standard seed test function phi looks like a "smooth bump". The (-phi') we have here is its negated first derivative; these functions look like a smooth bump down, followed by a smooth bump up. It's a smooth, continuous analog of the finite differencing operation f'(x) =~ (f(x+h) - f(x-h)) / 2h with the "bump down" corresponding to the f(x-h) and the "bump up" corresponding to the f(x+h) term. And you can keep pulling this trick as often as you want - if our phi is C^infinity, that means we get as many derivatives for f as we want out of this! The derivatives computed this way are called "weak derivatives", and they exist and are well-defined even where classical derivatives don't. If we take a classical differential equation, and we rewrite it in terms of integrals with test functions (and their derivatives), we get what's called the "weak formulation". Solutions to boundary value problems phrased this way are called "weak solutions", and they're much more general than "classical" solutions. (Although there are cases where one first shows that there exists a weak solution to a problem, and then goes on to show that the weak solution is actually a classical solution.) Why bother with all this? Fundamentally, because weak solutions are much easier to work with. For classical solutions, you're dealing with spaces like C^1(X), the space of continuously differentiable functions on X. The individual elements of C^1(X) are nice; but it's easy to give sequences of elements in C^1(X), say f_e(x) := sqrt(x^2 + e) such that f_e is in C^1(X) for all e>0, but the limit for e->0 (f_0(x) = |x|), since it's not a continuously differentiable function, isn't. (Again not bothering with details about what exactly I mean with convergence here.) In mathematical terms, C^1(X) is not complete. This is quite the bummer, since replacing a complicated function with successively finer, simpler approximations is one of the standard tricks for solving complicated math problems. Not being able to do this makes C^1(X) a pain to work with. The individual elements may be nice, but the space as a whole leaves something to be desired. L^p and Sobolev spaces are essentially the opposite. The individual elements can be messy as hell, but the *spaces* are great. They're complete (so our sequence example would work), and they have all kinds of nice properties that allow us to use approximation, smoothing etc. with lots of freedom to solve our problem, without having to worry about painting ourselves in a corner.