Population and
Sustainable Development

Issues with ethnicity data

Ethnic mobility and context effects

What is ethnic mobility?

Ethnicity is not fixed. Ethnic mobility occurs when people change their ethnic identification over time. For example, people's social environment may change with the result that they identify with additional or different ethnicities. People may also provide different responses depending upon the context or circumstances in which they are asked to state their ethnicities.

Ethnicity should ideally be collected by self-identification, but this is not always possible. Ethnic category jumping may occur because different people provide proxy responses to the ethnicity question. For example, the ethnicity of babies and young children is generally identified by their parents. However, in a later census when these children are old enough to complete their own forms, they will decide for themselves which ethnicities they identify with. These may differ from the original ethnicities identified by their parents.

One aspect of ethnic mobility which causes some concern is a perception that ethnic mobility involves losses to a group. This is not necessarily true because ethnic mobility often involves not loss of an ethnicity but acquisition of additional ethnicities, so that a person who may previous identified as (for example) being of Tongan ethnicity begins to identify as Tongan and Māori. In some cases there are losses to one or more groups.

Effect of context

Ethnic category jumping can also occur when different ethnicities are reported between different collections (eg a person may identify themselves differently for educational enrolments, benefit applications, census, etc). Part of this effect is due to the mode of collection rather than real ethnic mobility, with the different instruments of collection gathering different perspectives of how the person sees themselves in the particular context.

Difference between mobility and context

Ethnic mobility and context effects are quite different. Ethnic mobility is where people change how they identify their ethnicities over time. The time frame varies depending on the underlying drivers and may reflect long-term processes resulting from change of social environment or living arrangements such as partnership formation, change of job or moving to a different area. They may be shorter-term changes, even within the space of a day, although made for similar reasons (a typical example is where people identify themselves differently at work than they would at home).

Generally, issues about ethnic mobility involve the longer term processes because these affect data in time series. They are genuine changes in ethnicity. At an individual level, these relate to personal social changes. However, there are also underlying real-world changes which cause widespread ethnic mobility, often over a relatively short time.

The context effect is quite distinct from ethnic mobility because it reflects how people respond to how (the mode) and where/why (the circumstances) the information is collected.

A common cause of change relates to the perceived purpose of the data. For example, if a person considers the data in one collection to relate to familial information and to social environments in another collection, the data may differ. This does not mean that people are not completing a form responsibly; rather it is that they consider that for the perceived purpose the provided ethnicities best reflect how they identify themselves.

The effect of proxy responses also is an issue, because in this case the individual has had their ethnicities identified by a third party on their behalf. Ethnicities identified by proxy may be based on the proxy's perception of that individual's ethnic identity. This situation is most obviously seen in collection of information on births and deaths. How ethnicity is recorded in births is different in a number of interesting ways from how it is recorded for deaths.

Dealing with the effects of mobility and context

Although the two processes are distinct, the effect is difficult to differentiate. What to do when two data sets contain different information depends on the particular situation.

It is not always valid to take the most recent response. It is rare that it is known what an individual said on both occasions, which responses are in fact the older set and whether other factors were involved.

No general rule can be given on how to handle each case, except that it is frequently better to take the response associated with the denominator in any rate calculations.

The following example highlights the difficulties of using data for rate calculations when differences in both the context of data collection and potential ethnic mobility are present at the same time.

Example

If you are working out the birth rate for women of Asian ethnicity, you need to use the number of births to women of Asian ethnicity (numerator) and the number of women of childbearing age who are of Asian ethnicity (denominator). However, there may be significant differences in the context of this data - ie, how ethnicity is recorded for the numerator (as part of birth registration) and for the denominator (where ethnicity is derived from self-identification at census).

In this example there are both context and mobility effects. The birth registration usually occurs close to the time of the birth, whereas the census collection on which the estimates are based may have occurred up to five years earlier. Because people may change ethnicity over time, ethnic mobility may affect how the two sources compare with each other.

The context effects arise because the birth registration is generally completed by or with the help of others and may be seen in different terms than the census question, where ethnicity is self-identified.

There are many cases where it is not appropriate or there is no information to suggest a solution. In these cases, either an estimate of the net effect on the data or acknowledgement that this cannot be calculated may be the best solution.

Total response and combination ethnic data

There are currently two principal recommended approaches to ethnic data analysis: total responses and combinations.

Total response ethnic data

Total response ethnic data should be used where possible. Total response data counts every person who identified with an ethnicity included in an ethnic group.

People with responses which fall into more than one group are counted in every group with which they identified. This means that the sum of the ethnic groups will be greater than the number of people.

This is similar to many other variables, such as income sources or iwi affiliations, in which the count of the categories is greater than the count of people.

The advantage of total responses is that the relative size of the groups in the population is fairly represented (remembering to use the count only of people for whom ethnicity is available as the denominator). The proportion of a group which overlaps may be large.

Combination ethnic data

Combination ethnic data provides much more detailed information. It also has the advantage that the count of the categories is the same as the count of people who specified ethnicity.

In many cases the "overlaps" between the ethnic groups can be significant. For example, people who identify with two groups may be characteristically different from people who belong to one of these groups but not the other. These differences may have a regional aspect, different age profiles, different educational achievement levels or different employment patterns.

The disadvantage with combination data is that even at the highest level with six groups (European, Māori, Asian, Pacific, Middle Eastern/Latin American/African, Miscellaneous) there are 61 categories (including Not stated), and some combinations will be very small or empty. While not all combinations would be of interest, care is needed to ensure that groups chosen are consistent with limitation imposed by the quality of the data.

Using total response and combination data together

Careful analysis of total response data (which will give the total group of interest), together with analysis of the component combinations within the group, is a very powerful method of identifying diversity and dynamics within a group, and may assist in explaining trends.

Issues, difficulties and incompatibilities between collections

Is ethnicity the appropriate frame of reference?

Perhaps the most important issue is whether using ethnicity as a frame of reference is appropriate in every case. Since much policy is differentiated by ethnicity, analysis by ethnicity is a common starting point. It is also common to assume that this is a primary causative parameter. In some cases, a cause or a trend ascribed to ethnic diversity may be driven by another primary factor such as age or socio-economic status.

The age profiles of ethnic groups may be very different (especially for people of multiple ethnicities). Great care is needed when a characteristic of one group as a whole is compared with another group as a whole. Standardisation for age is an important method of adjusting for this.

Time-series analysis is a central component of policy planning. Collection of ethnicity has changed over time for every data source. It is important to consider the effects of these changes in definition, collection and coding on the data. This is particularly important for Mäori data which has been collated in many different ways over time.

Incompatibilities between collections

One of the major issues addressed in the Review of the Measurement of Ethnicity was incompatibility of data from differing sources. Many historical data sources have ethnicity recorded in a number of different ways and different sources often used different questions or processing methods. It is not always easy to work out exactly how sources might be validly compared. Because so much policy analysis requires trend analysis, a good understanding of changes in collections over time is essential.

When data is drawn from more than one source, it is important to consider the effect of the different ways in which the data has been collected and how it has been output. This is especially important in cases where the sources collect different numbers of responses or when the data has been tabulated in different ways, especially if rates are being derived.

For example, a collection that contains just one ethnic response will have used some method of prioritising data to remove excess responses. This has a biasing effect on the data, especially if the method used is systematic, so that conclusions may be very misleading. In the case of people of Māori and Pacific ethnicities in the 2001 census, 23 percent of people of Pacific ethnicities under the age of 15 years also identify as Māori in the census in 2001. The systematic prioritisation of the data, used in the late 1980s and early 1990s, which gave highest priority to Māori, for example, excluded people from the Pacific count simply because they happened also to identify as Māori, with misleading results for people of Pacific ethnicity.

Different sources for numerator and denominator

A particular problem arises when administrative data (such as police statistics) is being used as a numerator, and survey data (such as census) as the denominator to derive rates. In this case, not only is the data collected in different ways, but the underlying concepts and purposes of the collections may be different.

For example, the ethnicity recorded for offenders may represent physiological appearance as perceived by an arresting officer, whereas the ethnicity recorded by census is self-identified socio-cultural affiliation.

The police record is generally based on the appearance of the alleged offender and recorded by the arresting officer rather than provided by the offender. It may also be less likely than the census data to include multiple ethnicities.

Mobility or context change?

When a piece of information was actually provided may not always be clear. For example, the situation where a health record has one set of responses and a hospital admission form has another might suggest that the admission data is the 'correct' information on the assumption that the health record was much older.

If this is true and the data was recorded correctly on both occasions, this is an example of ethnic mobility. However, care is needed because health records may have been updated in a different phase of the same treatment, reflecting context changes rather than ethnic mobility.

Avoid assuming one answer replies to two questions

One of the really difficult problems in this area arises from how people analyse the data - a common dichotomous analysis is Māori/non-Māori. The assumption is that failure to tick the Māori box is in all cases deliberate - as though people had actually been asked two questions: Are you of Māori ethnicity? and Are you non-Māori? This does not happen and absence of a particular response does not necessarily equate to absence of that characteristic - it may be merely the way that person answered in that context at that time.

This is particularly difficult since people are very interested in such a dichotomous analysis - for example in disparity analysis - and it is very easy to arrive at misleading conclusions. For example, more than half of all people of Māori ethnicity also identify with other ethnicities.

Best practice

Perhaps the most appropriate way of analysing ethnic data is to compare a group with the total number of people with at least one valid response (rather than constructing a fictitious non-group), making sure that things like age-standardisation are done where appropriate. (Remember that for some age groups in the New Zealand population, around one in three Pacific people are also Māori, and one in four people of Māori ethnicity are also of Pacific ethnicity. In addition, 'not specified' cases are by no means insignificant in some collections.)

In the context of ethnic mobility (as with other types of migration - such as geographic), the importance of the process lies as much in the differences between the gross inflows and outflows as in the net outcomes.

Dilution effect

This increase in multiple ethnicity is related to the problem which in bicultural studies is often called 'dilution effects'. In many real-world social contexts the trend is towards multi-ethnic identity, as reflected, for example, in births data.

But if a collection tends to encourage single responses (either because of context or because of questionnaire design), the consequences are different for each of the component ethnicities - especially if the shift is a loss from both of the groups of interest.

The environment which the data purports to reflect should be what we are trying to determine.

Back to Ethnicity

Back to New trends in topical population issues

Population Statistics Unit | Statistics New Zealand Statistics House,

The Boulevard, Harbour Quays, PO Box 2922, Wellington, New Zealand.

Ph: 0508 525 525 Fax:+64 4 931 4079