Topic: Analytics & metrics for community based software development
Scheduled: 7B, Friday 2pm, space 7
Taking notes: @BruceCr
Editing notes: @jgbarah
Final notes of the session
We started by commenting how to track user communities, and in particular the Adobe Analytics case example:
- Tracking web page views, by using for example 1x1 pixel on each web page,
- Forum events. It is important to know what we want each community to do
- Community users. Each user may have an ID, so that what they do can be related to other actions.
- Users of a program can be tracked, and the actions they do. This way, it can be detected if they are blocked by lack of features, and they can be asked.
- Programs must be deployed on the community server. Data can be sent to a customer support portal
Building on this start, we had several comments:
- The metrics needed are different, depending on who the community manager is reporting to. For example, reporting to the marketing department is very different from reporting to the engineering department.
- Rule of participation: 90% of people are lurkers, 9% contributors, 1% creators http://en.wikipedia.org/wiki/1%25_rule_%28Internet_culture%29
- Very important to remember: any metric, any number, can be gamed, cheated. So, be careful with metrics about relative performance, involved actors are motivated to try to cheat them.
Then we moved to discuss about metrics / data that could be interesting to a community manager of a development community:
- Questions / answers (for a forum or mailing list).
- How many returns versus new logins (for the community website, assuming that you can open an account in it).
- Average time between when a bug is submitted and it is fixed.
- Average time for a contribution to be entered into the code base. Related to this: Why is code review taking so long?
- People who left (for example, to make exit interview).
- How long it takes to set up the environment, up to the moment someone is in the position to start contributing.
- Number of commits
- Number of projects each person contributes to
- Ticket response time
- Tracking actions online, and offline, and matching both. Example of offline: Meetup groups: for how many people the first point of contact was an in-person event, and now they are joining the online group? Or just knowing the first point of contact, if it was asking and maybe answering questions on StackOverflow
- Bugs? Regressions? Other code quality issues?
- How often code extracts are making it into other projects (would love to get this). GitHub can do this
- Core project versus side project. Split again - who is helping with the existing code, or who is contributing a new feature. Why weight the different groups?
It was recognized as well that we still need to decide on what actions are more convenient to take based on increased or decreased numbers for these parameters.
During this discussion, some other aspects popped up. Just to highlight some of them;
- Community Managers protect the project from management. Their asking for data is usually just to prove their stance
- There are two very different cases: individual, independent volunteer developers commit the code, or developers hired by a company are the ones making contributions, This can be a large difference with respect to how the project is performing.
Another topic that was widely discussed as "why collect information of this kind?". Some reasons:
- Know what is going on to make smarter decisions. Usually community managers are interested in this
- I have an outcome in mind, I can prove it using metrics. Usually management or the PR department want this
For some open source projects (ie Apache), in some cases they try not to consider too much the company hiring each developer, consider they speak on their own behalf. There is a balance between companies riding the project and companies not being important for it. By-laws of the project should be careful here.
Focusing on which companies are contributing the most, that can cause land-grabbing
Case found in the Drupal Community. What was causing some spikes in contributions?
- The increase in participation happens about three months after their community event. Now that you know that, why does it take so long? Ramp up time? Training?
- The metric had to be looked at a few months after the sprint, then you need continuous tracking.
Some final thoughts:
For open source, numbers (and transparency in numbers) should be important.
Developers are important; or communities are important?