Jump to content

Research:Surviving new editor

From Meta, a Wikimedia project coordination wiki
Surviving new editor
Specification
A is a new editor who completes at least edits within time since registration () and also completes edits in the survival period .
WMF Standard
  • = 1 edit
  • = 1 edit
  • = 1 day
  • = 30 days (~ one month)
  • = 30 days (~ one month)
Measures
Editor retention
Aliases
Retained editor
Related metrics
New editor
Status
draft
SQL
SET @activation_period = 1; /* One day */
SET @n = 1; /* One activation edit */
SET @trial_period = 30; /* 30 days */
SET @survival_period = 30; /* 30 days*/
SET @m = 1; /* One survival edit */
SET @start_date = "20140101"; /* January 1st, 2014 after midnight */
SET @end_date = "20140201"; /* February 1st, 2014 before midnight */

SELECT
    user_id,
    user_name,
    user_registration,
    SUM(activation_edits) > @n AS activated,
    SUM(activation_edits) > @n AND SUM(surviving_edits) > @m AS surviving,
    (
        UNIX_TIMESTAMP(NOW()) <
        UNIX_TIMESTAMP(DATE_ADD(user_registration, INTERVAL @trial_period+@survival_period DAY))
    ) AS censored
FROM (
    SELECT
        user_id,
        user_name,
        user_registration,
        SUM(
            rev_timestamp BETWEEN
                user_registration AND
                DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @activation_period DAY), "%Y%m%d%H%i%M")
        ) AS activation_edits,
        SUM(
            rev_timestamp BETWEEN
                DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period DAY), "%Y%m%d%H%i%M") AND
                DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period+@survival_period DAY), "%Y%m%d%H%i%M")
        ) AS surviving_edits
    FROM user
    LEFT JOIN revision ON
        user_id = rev_user AND
        (
            rev_timestamp BETWEEN
                user_registration AND
                DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @activation_period DAY), "%Y%m%d%H%i%M") OR 
            rev_timestamp BETWEEN
                DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period DAY), "%Y%m%d%H%i%M") AND
                DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period+@survival_period DAY), "%Y%m%d%H%i%M")
        )
    WHERE user_registration BETWEEN @start_date AND @end_date
    UNION ALL
    SELECT
        user_id,
        user_name,
        user_registration,
        SUM(
            ar_timestamp BETWEEN
                user_registration AND
                DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @activation_period DAY), "%Y%m%d%H%i%M")
        ) AS activation_edits,
        SUM(
            ar_timestamp BETWEEN
                DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period DAY), "%Y%m%d%H%i%M") AND
                DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period+@survival_period DAY), "%Y%m%d%H%i%M")
        ) AS surviving_edits
    FROM user
    LEFT JOIN archive ON
        user_id = ar_user AND
        (
            ar_timestamp BETWEEN
                user_registration AND
                DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @activation_period DAY), "%Y%m%d%H%i%M") OR 
            ar_timestamp BETWEEN
                DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period DAY), "%Y%m%d%H%i%M") AND
                DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period+@survival_period DAY), "%Y%m%d%H%i%M")
        )
    WHERE user_registration BETWEEN @start_date AND @end_date
) split_edit_counts
GROUP BY user_id, user_name, user_registration;

Surviving new editor is a standardized user class used to measure the number of first-time editors in a wiki project who continue to edit for a substantial period of time. It's used as a proxy for editor retention.

Discussion

[edit]

The activation period

[edit]

The activation period selects users whose retention needs to be measured:

  • setting measures the retention (or rather a delayed activation) of newly registered users, regardless of when they started editing.
  • by setting to a value other than 0 we restrict the measurement of retention to a subset of users who edited within a given activation period since registration
  • by setting we measure the retention of new editors, based on the proposed definition of a new editor: when we do so, we effectively consider surviving new editors as a proper subset of new editors.

The trial period

[edit]

During the trial period, new editors are presumed to be testing out Wikipedia and Wikipedians are testing out the editor. This is the time when non-retained editors tend to leave Wikipedia and when retained editors decide to stick around. The longer the duration of this period, the longer an editor will need to remain active in order to be counted.

The survival period

[edit]

During the survival period, new editors who are retained are expected to show some activity to indicate their survival. The longer the duration of the survival period, the more likely we are to notice some activity from editors who are less consistently active. Longer survival periods are also likely to catch users who left Wikipedia reactivating their accounts.

Analysis

[edit]

Wikis

[edit]

German

[edit]
The proportion of surviving newly registered user is plotted by registration date for a set of different trial and survival periods.
Survival rate comparison (dewiki). The proportion of surviving newly registered user is plotted by registration date for a set of different trial and survival periods.

English

[edit]
The proportion of surviving newly registered users is plotted by registration date for a set of different trial and survival periods.
Survival rate comparison (enwiki). The proportion of surviving newly registered users is plotted by registration date for a set of different trial and survival periods.

Sensitivity

[edit]

Trial period duration

[edit]
The factor of difference between proportions of surviving new editors for different trial periods is plotted (based on trial period = 3 months and locking the survival period to 3 months).
Trial period factor. The factor of difference between proportions of surviving new editors for different trial periods is plotted (based on trial period = 3 months and locking the survival period to 3 months).

Figure #Trial period factor plots the factor relationship between the # of users who edit after 3 months (horizontal line at ) and the number users who edit after 1, 2, 4, 5 and 6 months. It looks like both enwiki and dewiki have a bit of trend where the number of users surviving for 1 or 2 trial months in relation to 3 or more is changing. This is not extreme and therefore might not matter. But it does suggest that even users who survive 1-2 months are getting less likely to survive 3.

Survival period duration

[edit]
The factor of difference between proportions of surviving new editors for different survival periods is plotted (based on survival period = 3 months and locking the trial period to 3 months).
Survival period factor. The factor of difference between proportions of surviving new editors for different survival periods is plotted (based on survival period = 3 months and locking the trial period to 3 months).

Figure #Survival period factor plots the factor relationship between the # of users who edit within a 3 month window (horizontal line at ) and the number users who edit within 1, 2, 4, 5 and 6 month windows. For the survival period duration, we don't see any meaningful change over time.

Usage

[edit]

References

[edit]