METHODOLOGICAL NOTE
"Spinning digits" may, on the surface, seem economical, but there are a number of inherent risks that the user should be aware of.
"Near-RDD sampling" has probably been around as long as true RDD telephone sampling has been, and there are probably as many variations as there are research companies and commercial sample suppliers. Unfortunately, most ofthese shortcuts can have a serious impact on a study's reliability and are usually used either through ignorance or on a bet that clients will never know the difference - all this to save a few dollars per study on sample costs. And to make matters even more confusing, only two commercial sampling companies (GENESYS and STS) do provide true epsem RDD samples, and most sampling companies offer RDO approximations. Even GENESYS offers two carefully controlled RDD approximations.
Although the full array of RDD samples and approximations are staggering, the focus of this Methodological Note will be limited to plus-1 sampling and procedures that are often called "spinning digits" - methods that have traditionally been used by some research companies to "stretch" their sampling dollars.
These companies will purchase a telephone sample from a supplier and then add an additional, arbitrary number to the original numbers purchased. For example, simply adding "1" to each number in a telephone sample doubles the quantity of sample numbers purchased for the same price. That original set of sample telephone numbers is sometimes used as a set of" seeds" for a year or even longer by just continuing to add different numbers.
The key issues which arise with plus-1 sampling are:
- Statistical risks,
- Operational and efficiency problems, and
- The positions that sample suppliers take in regards to clients who use their samples as seeds.
Statistical Risks
The most compelling argument in favor of plus-l sampling, occurs when one starts with a true RDD sample as seeds, and simply increments each telephone number; one arguably has not affected the sample distribution and one would expect the same results from both. Moreover, successively adding additional digits does not alter this conclusion.
However, and this is where companies ultimately run into problems, there are limitations, and most companies are not aware of what they are until it is too late and their client discovers a major problem. We've heard of companies who have used the same set of "seeds" for years - ignoring new exchanges, new working banks, and even forgetting about new area codes. Obviously, serious problems with the sampling frame can arise.
As long as one starts with an epsem RDD sample and frequently updates the seeds, the risks are not very large. Many who use plus-1 procedures run into serious problems when these precautions are not taken, or even worse, when business purging or a process to identify nonworking numbers has been used. The original epsem RDD sample is representative of all residential telephone numbers. However, when sample numbers are eliminated from use as seeds because of their nonworking or business status, this effectively excludes entire groups of exchanges represented by that seed from all future samples. In other words, all future samples created from those seeds will be biased, and there is no way to correct it.
Operational and Efficiency Problems
Obviously, the primary motivation for using plus-1 and short-cutting the sampling process is to save a few dollars on sample costs. The difficulty is that this may backfire through unexpected increases in data collection costs because: (1) working banks and exchanges go"out-of-service" - this translates into more nonworking numbers; (2) exchanges assigned the new area code in an area code split receive "intercept messages" informing the caller of the new area code and requiring a redial - this translates into additional time per dialing; and after the grace period, all those numbers becomes nonworking; and (3) incrementing the seeds may push the sample number into a portion of the exchange where there are no residential numbers.
The above are obviously not related to line-item increases in one's cost estimate, so they are difficult to identify. However, they do affect the bottom line - and it can be very difficult to determine how much of an affect they have had.
If the causes of increased data collection costs discussed above seem to be minor concerns, please keep in mind that the GENESYS database went through five updates in 1992 in order to keep the sample frame current and efficient.
Postions of Sample Suppliers on plus-1
Some companies explicitly prohibit their clients from using their samples as seeds, claiming a proprietary interest in the samples they sell (thus prohibiting using plus-1). We have taken a very different stance on the issue of clients using a plus-1 process on GENESYS samples. We have decided against attempting to police a practice that cannot be prevented and we prefer not to develop a policy that cannot be enforced.
It has always been our position that if a client of ours wishes to utilize this procedure, it is their decision to make and they are within their rights to do so. However, they must solely bear any risk and potential fallout which may result from this practice. The guarantees and backups which apply to all GENESYS samples end with the original sample. When digits are added to a GENESYS sample, it is no longer a GENESYS sample. We will just not support sample altered in this way - you are on your own. We trust that such samples are not represented as GENESYS samples to anyone, because they are not.
Using plus-1 samples does carry significant risks, especially if the original sample, used to provide seeds, is not strictly representative. Moreover, the indirect costs may more than offset that of new up-to-date, accurate, represntative, and probably more efficient RDD sample.
We practice full disclosure on an ethical basis for all of our sampling processes, and we think that our clients respect that honesty. We also believe that your clients deserve the same, and as long as they are fully aware of any shortcuts used and the risks associated with them, then there really is no problem. But please keep in mind that only the sample generated through GENESYS is a GENESYS sample.