Motivation

Many rival firms simultaneously collaborate and compete in the open-source arena, i.e. open-coopetition (Teixeira and Lin 2014). How do rival technology vendors joint-develop the OpenStack Cloud Computing platform?

Firms simultaneously collaborating and competing in the OpenStack open-source project

Do firms like Rackspace, VMware, IBM, HP really collaborate all together in the OpenStack open-source project?
Or otherwise, each of them work on its own niche/piece of OpenStack?

Research Team

Jose Teixeira Turku School of Economics (TSE) University of Turku, Turku, Finland - jose.teixeira (at) utu.fi - http://users.utu.fi/joante/
Gregorio Robles Grupo de Sistemas y Comunicaciones (GSyC) Universidad Rey Juan Carlos, Madrid, Spain - grex (at) gsyc.urjc.es - http://gsyc.urjc.es/~grex/

Methodological details

We combined and virtual-ethnography (VE) with a Social Network Analysis (SNA) over publicly-available and naturally-occurring open-source data that allowed us to re-construct and visualize the evolution of the OpenStack collaboration in a sequence of networks.

Virtual Ethnography

We started by screening, by ethnographic manners, publicly available data such as company announcements, financial reports and specialized-press that allowed us to gain insights of the industrial context.

Initial research questions

  1. Do firms collaborate in more specific components (specialization) or its a more wide collaboration across components (hyper-collaboration)?
  2. Do firms that compete in the same revenue model, e.g. Mirantis and IBM on private clouds or Rackspace and HP on public clouds, collaborate less with each other?
  3. How firms centrality varies over time? How they react to exogenous industry events (e.g. citrix announcing that will move from OpenStack towards CloudStack)?
  4. Who are the most central firms in the project? Are firms more central at expense of the most central developers? Or developers are more central thanks to the centrality of the firms that they are affiliated with? What matters most for centrality in OpenStack, a "start" developer or a good team?
  5. Does team-size matter? Do firms need to be big to be central in the OpenStack project? Startups vs Techno-Giants? Mirantis vs IBM?
  6. Do centrality measures correlate with code-based metrics (lines of code, number of commits, implemented specifications)?
  7. Which OpenStack projects are more centralized? What open-stack projects are less centralized?
  8. Do the most central actors and firms fix more bugs? or introduce more bugs?
  9. Which kind of actors are more central within the collaboration network? coders? testers? bug-fixers? translators? release coordinators?

Initially screened websites

Table 1 - Initial sample of websites
Website Title
http://www.openstack.org/ OpenStack Open Source Cloud Computing Software
http://slashdot.org/ Slashdot: News for nerds, stuff that matters
http://www.zdnet.com/ZDNet | Technology News, Analysis, Comments and Product Reviews for IT Professionals
http://news.cnet.com/Technology News - CNET News
http://www.computerworld.com/ Computerworld - IT news, features, blogs, tech reviews, career advice
http://techcrunch.com/ TechCrunch - The latest technology news and information on startups
http://bitergia.com/ Software development analytics for Open Source projects - Bitergia
http://stackalytics.com/ Stackalytics | OpenStack community contribution in Juno release
http://cloudarchitectmusings.com/ Loud Architect Musings | Musings On Cloud Computing and IT-as-a-Service
https://www.datacenterdynamics.com/ Datacenter Dynamics
http://www.bbc.co.uk/British Broadcasting Corporation
http://www.nytimes.com/The New York Times - Breaking News, World News & Multimedia
http://www.todayonline.com/TODAYonline | Comprehensive Singapore and international news and analysis
http://www.koreatimes.co.kr/The Korea Times
http://www.nation.co.ke/Daily Nation: Home - Breaking News, Kenya, Africa, Politics, Business
http://elpais.com/EL PAÍS: el periódico global
http://www.folha.uol.com.br/Folha de S.Paulo - Jornal on-line com notícias, fotos e vídeos

Other screened websites

So many !!

Longitudinal Segmentation

Table 2 - Releases of the OpenStack project
Date Release name
Oct 21, 2010 Austin
Feb 3, 2011 Bexar
Apr 15, 2011 Cactus
Sep 22, 2011 Diablo
Apr 5, 2012 Essex
Sep 27, 2012 Folsom
Apr 4, 2013 Grizzly
Oct 17, 2013 Havana
Apr 17, 2014 Icehouse

Purposive Sampling

The OpenStack foundation governs dozens of open-source projects. We started by addressing the OpenStack NOVA project, the biggest and most “core” project in OpenStack .

OpenStack NOVA is a cloud computing fabric controller, the main part of an Infrastructure as a Service (IaaS) system. The project originally started at the NASA Ames Research Laboratory, but it further evolved to an inter-firm high-networked open-source project developed by dozens of firms and thousands of developers.

We addressed the TOP10 commercial firms contributing to the OpenStack NOVA project. The selection was based on OpenStack's own empirical measures: the number of commits; the number of committed lines of code; and the number completed blueprints.

Table 3 - TOP 10 contributors to the OpenStack open-source project
FirmDescription
cannonical The makers of Ubuntu. Provider of support services for Ubuntu deployments in the enterprise.
citrix Multinational software company that provides virtualization, networking, software-as-a-service (SaaS), and cloud computing technologies.
cloudscale Services and open-source products company selling custom cloud infrastructure for large service providers, chiefly telecom service providers.
hp Multinational IT company. Provides hardware, software and services to consumers, small- and medium-sized businesses (SMBs) and large enterprises.
ibm Multinational technology and consulting corporation.
mirantis North California software company specialized on OpenStack.
nebula North California hardware and software company specialized on cloud computing.
rackspace Multinational IT hosting company.
vmware Software company that provides cloud and virtualization software and services.
redhat Multinational software company providing open-source software products to enterprises.

Social Network Analysis

After attaining a better understanding of the the competitive dynamics of the Cloud Computing Industry, we started extracting and analysing the social network of the OpenStack community leveraging SNA (Scott, 2012; Wasserman and Faust, 1994), which is an emergent method widely established across disciplines of social sciences in general (Borgatti and Foster, 2003; Uzzi, 1996; Wasserman and Faust, 1994; Watts, 2004)

The visualisation power of our approach that data-mines the OpenStack source-code and its version-control-system change-log is is illustrated in the following video:

OpenStack Open-Source Development

Research-data

Inputs for our analysis were the OpenStack source-code and its version-control-system change-log. An archived version of the raw-data is here available within a single Tarball. We covered contributions to the OpenStack source-code from 30 of May 2010 to 17 of April 2014 (circa 4 years longitudinal data). Data-cleansing efforts took several weeks, as OpenStack does not have very strict code-commit policies.

Source-code

Python scripts scrapping the OpenStack change-log are available here within a single Tarball.

The first networks were constructed ad-hoc "in-code" to the dl (UCINet), GraphML and Pajek formats.

Networks

Both our community visualisation and sub-community detection approaches (network clustering) relied on 8 networks, each capturing different phases of the OpenStack development.

For understanding the evolution of the code-based collaboration, we connect developers who work on the same file, constructing a network of collaboration activities among developers. With the visualization of the network over time, we gain insights on collaboration and rivalry within the software project.

How we modeled the network
Modelling collaboration from the source-code repositories change-log

The collaborative network during a certain time slice can be formally defined as:
Gt = (V,Av,E)
Where:
V = A set of nodes representing the developers contributing to the OpenStack open-source software project
E = A set of edges, identifying the connections between two developers if they have worked on the same software source-code file.
Av = A set of nodes-attributes, capturing each developer’s company affiliation. This information is extracted from the email address of each developer.


Table 4 - Essential collaborative-networks driving the Social Network Analysis
Network Phase RawData
OpenStackSNA_AB From Austin release to Bexar release OpenStackSNA_AB.graphML
OpenStackSNA_BC From Bexar release to Cactus release OpenStackSNA_BC.graphML
OpenStackSNA_CD From Cactus release to Diablo release OpenStackSNA_CD.graphML
OpenStackSNA_DE From Diablo release to Essex release OpenStackSNA_DE.graphML
OpenStackSNA_EF From Essex release to Folsom release OpenStackSNA_EF.graphML
OpenStackSNA_FG From Folsom release to Grizzly release OpenStackSNA_FG.graphML
OpenStackSNA_GH From Grizzly release to Havana release OpenStackSNA_GH.graphML
OpenStackSNA_HI From Havana release to Icehouse release OpenStackSNA_HI.graphML

Visualizations were performed with the Pajek, Visone and Gephi specialized SNA tools

Network structural properties were calculated with Pajek, Visone and R with the statnet package. Some additional network structural measures were also calculated using LibreOffice spreadsheet software.

Main used methods were:

  1. Visualisations with degree centrality.
  2. Calculations of node and group centrality measures.
  3. Markov chain clustering.
  4. Network modularity maximization with heuristics.
  5. Hub based community detection with different parameter configurations.
  6. Simmelian Backbone extraction.

Visualisations

Bexar
From Austin release to Bexar release

Oct 21, 2010 to Feb 3, 2011
cactus
From Bexar release to Cactus release

diablo
From Cactus release to Diablo release

essex
From Diablo release to Essex release

folsom
From Essex release to Folsom release

grizzly
From Folsom release to Grizzly release

havana
From Grizzly release to Havana release

icehouse
From Havana release to Icehouse release

Analysis of network vizualizations
Analysis of network visualizations
Making sense of network visualizations - a very long analysis process.

Measures

The selected network-measures were:

nodes edges and density
Rplot 1 - Evolution of network-measures (developers, collaborative relationships and network density ) over time

The number of nodes (number of networked developers) increased over time until the Havana release. The number of edges (collaboration between networked developers) also increased over time. However the social graph got less dense over time revealing a trend towards community sub-grouping. According to a number of social network scholars (White 2002; Friedkin 2004; Zou & Yilmaz 2011; Prell 2012) density is a rough measure of cohesiveness in large networks (a cohesive network is assumed to be less vulnerable to the removal of any one individual).

From the Diablo to Essex release, and from the Folsom to the Grizzly release, the social graph density increased with the group size. Even if the the number of developers increased, the community got more cohesive. Given such momentum of cohesive group dynamics, those releases do not provide interesting data for performing sub-community detections in the OpenStack Nova project. Therefore to better capture the different sub-communities from the OpenStack case, we opted to us data from the last OpenStack releases (Grizly, Havana and Icehouse). A higher project maturity and a steady diminution of group cohesion (i.e. tendecy for sub grouping) drove then our releases-sample selection for sub-community detection.



Table 5 - TOP 10 firm structural position within collaborative network












Bexar Cactus Diablo Essex Folsom Grizzly Havana Icehouse











Measure








Number of nodes 14 21 35 61 66 90 136 101

Network density 0.769 0.686 0.555 0.617 0.4 0.436 0.368 0.281

% of developers


8.16% 3.33% 8.14% 3.91% 2.97%
cannonical Centrality output


3.0946022263 1.4063242792 4.0685858948 2.5862504576 1.481628607

Centrality efficiency


37.9088772726 42.1897283765 49.9854838508 66.2080117134 49.8814964369

% of developers 21.43% 23.81% 11.43% 14.29% 5.00% 4.65% 2.34% 0.99%
citrix Centrality output 2.5519127825 3.1237702298 1.4240810088 4.9186122984 2.1632273523 2.4884264305 0.7146563749 0.4289705008

Centrality efficiency 11.9089263185 13.119834965 12.4607088274 34.4302860885 43.2645470467 53.5011682559 30.492005329 43.3260205828

% of developers
4.76% 2.86% 2.04% 6.67% 3.49% 0.78% 0.99%
cloudscale Centrality output
0.6985576053 1 1 2.1463242398 2.2687930765 0.9449220528 0.0420035942

Group Eigenvector Centrality Degree
14.669709712 35 49 32.1948635968 65.0387348609 120.950022755 4.2423630161

% of developers

8.57% 16.33% 18.33% 16.28% 14.84% 13.86%
hp Centrality output

1.7128641375 6.0431780129 4.2367609092 7.1932023664 9.7639427872 7.3547222331

Centrality efficiency

19.9834149376 37.0144653292 23.1096049593 44.1868145366 65.7781408825 53.0590675388

% of developers


2.04% 21.67% 29.07% 33.59% 31.68%
ibm Centrality output


0.717416681 6.158596274 15.7589315539 20.6754304415 11.8180702722

Centrality efficiency


35.1534173667 28.4242904954 54.2107245455 61.5454673607 37.3007842965

% of developers

2.86% 2.04% 1.67% 1.16% 13.28% 9.90%
mirantis Centrality output

0.7435182384 0.0280453528 0.1594230137 0.1638083506 9.5131219812 2.5352644564

Centrality efficiency

26.0231383436 1.374222287 9.5653808245 14.0875181501 71.628212564 25.6061710092

% of developers



1.67% 1.16% 0.00% 0.00%
nebula Centrality output



0.6119974877 0.2507864269 0 0

Centrality efficiency



36.7198492619 21.5676327125


% of developers 78.57% 71.43% 68.57% 38.78% 28.33% 16.28% 12.50% 11.88%
rackspace Centrality output 9.010178391 11.8740251559 16.3592432294 12.136411647 11.1720360492 7.4434803333 6.1581240268 4.0487235866

Centrality efficiency 11.4674997703 16.6236352182 23.8572297096 31.2991668792 39.4307154677 45.7242363331 49.2649922147 34.0767568543

% of developers


16.33% 13.33% 19.77% 14.06% 16.83%
redhat Centrality output


7.5403822225 6.1810693793 9.5348787828 12.2549675963 10.7460833714

Centrality efficiency


46.1848411127 46.3580203444 48.2352691366 87.1464362401 63.8443776772

% of developers





3.91% 10.89%
vmware Centrality output





1.2412541843 4.8746081635

Centrality efficiency





31.7761071192 44.7577658646

Findings

Here we list some preliminary interesting findings (at least for us):

Implications

Methodology

In our view, this study made some methodological contributions:

Theory

We believe to confirm/reinforce the current body of theoretical knowledge on Coopetition: But we would like to add that:

Moreover, our research reiterates the power of the open-source fork concept as a nexus enabling both features of competition and collaboration, confirming previous results (Teixeira and Lin 2014).

Open-coopetion and the importance of fork
Open-source fork as an enabler of both collaborative and competitive features.

The importance of forking in OpenStack case was is also highlighted in some of our internet-retrieved materials. As in an official document from Red Hat, one of the TOP contributors to OpenStack:

"Red Hat is continuing its fully open source philosophy with all development done upstream in OpenStack rather than as proprietary, closed source add-ons. This keeps new-innovation in the core code and prevents unnecessary forking, interoperability, and incompatibility issues" (IDC report downloaded on the 3rd of May 2014).

Practice

Managerial practice

Software engineering practice

Open-coopetion software engineering practice
Combining code-driven metrics with Social Network Analysis visualizations

Regulatory practice

By better understanding both collaboration and competition in the open-source arena we are better prepared for:

Theory building

The Open-Coopetion theory
Drafting the Open-Coopetition theory
Coopetition vs Open-Coopetion
Contrasting Coopetition (established) vs. Open-Coopetition (proposed))
Open-coopetion in pracice
Situating Open-Coopetition in practice

Key references

  1. Teixeira, J., Lin T (2014). Collaboration in the open-source arena: The OpenStack case arXiv:1401.5996. OA
  2. Myers, M., (1999). Investigating information systems with ethnographic research. Communications of the AIS 2, 1. PS
  3. Kozinets, R.V. (2002). The field behind the screen: using netnography for marketing research in online communities. Journal of marketing research 61–72. PS
  4. Scott, J. (2012). Social network analysis (SAGE Publications Limited). PS
  5. Wasserman, S., and Faust, K. (1995). Social network analysis: Methods and applications (Cambridge university press). PS
  6. Borgatti, S.P., and Foster, P.C. (2003). The network paradigm in organizational research: A review and typology. J. Manag. 29, 991–1013. PS
  7. Uzzi, B. (1996). The sources and consequences of embeddedness for the economic performance of organizations: The network effect. Am. Sociol. Rev. 674–698. PS
  8. Watts, D.J. (2004). The“ new” science of networks. Annu. Rev. Sociol. 243–270. OA PS
  9. Howison, J., Wiggins, A., Crowston, K., (2011). Validity Issues in the Use of Social Network Analysis with Digital Trace Data. Journal of the Association for Information Systems 12. PS
  10. Bengtsson, M. and Kock, S. (2000). ” Coopetition” in business Networks—to cooperate and compete simultaneously. Industrial marketing management 29.5 (2000): 411-426. PS
  11. Von Hippel, E. (2009). Democratizing innovation: the evolving phenomenon of user innovation." International Journal of Innovation Science 1.1 (2009): 29-40. PS
  12. Lakhani, K. and Von Hippel, E. (2003) How open source software works: “free” user-to-user assistance. Research policy 32.6 (2003): 923-943. PS

Publications

  1. Teixeira, J. (2014) Open-coopetition in the Cloud computing Industry: The OpenStack NOVA case. European Conference on Social Networks, Barcelona, July 1-4, 2014.

Acknowledgements

The idea of this research project surged by pure serendipity at the Inforte seminar on Big Data and Social Media Analytics by Sudha Ram and Matti Rossi. The researchers thank the financial support from the Fundação para a Ciência e a Tecnologia (grant SFRHBD615612009), Liikesivistysrahasto (grant 3-1815) and Marcus Wallenberg Säätiö (open-coopetition R&D management strategy) Acknowledgements also for Lero - the Irish software engineering research centre were part of this research was conducted. A last word to the OpenStack developers for developing cool, open and research-friendly technologies.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.