In fact, cluster sampling, as you have defined it, is a form of random sampling, and this can be shown as follows:

Suppose there are a total of R clusters, labeled C_{1},C_{2},...,C_{R}, and suppose that we intend to select exactly r of them randomly, then add each member of the selected clusters to our sample.

Take any individual member of the population, and let S denote the event that this member is ultimately selected for our sample. Of particular interest to us is P(S).

For each i = 1,2,...,R, define A_{i} to be the event that the member is in cluster C_{i}, and define B_{i} to be the event that C_{i} is one of the r clusters that is selected.

We have

P(S) = P(S and A_{1}) + P(S and A_{2}) + ... + P(S and A_{R})

= P(A_{1})•P(S|A_{1}) + P(A_{2})•P(S|A_{2}) + ... + P(A_{R})•P(S|A_{R})

= P(A_{1})•P(B_{1}) + P(A_{2})•P(B_{2}) + ... + P(A_{R})•P(B_{R}).

Now, each P(B_{i}) is equal to r/R, because the r clusters are selected randomly from the pool of R clusters. Factoring out r/R from each of the terms above gives

P(S) = r/R • [ P(A_{1}) + P(A_{2}) + ... + P(A_{R}) ].

The sum of the P(A_{i}) is 1 because the C_{i} partition the population, so that exactly one of the A_{i} must occur. This leaves just

P(S) = r/R.

Notice that this probability is constant. It does not depend on which member of the population was being considered. In other words, *every *member of the population has the same probability r/R of being selected for the sample. By definition, a sampling method which gives each member of the population the same probability of being selected for the sample is a random sample.