Using Meetup data to explore the UK digital tech landscape

Reblogged with permission from Juan Mateos-Garcia, whose team at Nesta studies innovation in creative and digital industries. His team is interested in tracking the emergence of new technologies, and communities of innovators. This blog presents what they found using the data available from our API. The article was originally published on Nesta.

Juan is also a member of the Data Visualisation Brighton Meetup, and Big Data Debate in London.  

 

Tech meet-ups have become an important feature of the digital innovation landscape. In these events, coders, designers, hackers and entrepreneurs (among others) come together to learn from each other and network. Meetups can help participants keep their technology skills fresh in fields that move too fast for universities and training providers, and facilitate collaboration and job mobility, increasing the connectivity and efficiency of local innovation ecosystems.

Websites like Meetup and EventBrite have emerged to make it easier for people to create and manage meet-ups.[i] The data generated by these platforms could help us understand when and where new technology communities emerge and evolve, and how they are connected to each other. It could also tell us something about the rise of new technologies. These are questions of obvious interest for policymakers, entrepreneurs, businesses and investors who want to identify the right communities of innovators to work with, and the right technologies to target.

In this blog, we undertake a preliminary exploration of UK tech meet-ups from Meetup to assess its potential as a source of information about the structure, geography and evolution of digital tech in the UK.

About the data

Meetup was created in 2002 to help people connect with others in their community. It currently has over 20 million users in 192,000 groups in 181 countries. When registering, users express interest on particular topics (e.g. “data science” or “online marketing”), and are shown information about groups near to them that focus on those topics (or similar ones).[ii] Users can join those groups to receive updates about forthcoming events. Meetup charges group organisers a monthly subscription fee.

To get the data, we query the Meetup API for groups in the “Tech” category in UK cities (based on this Wikipedia list). This returns 3,707 groups as of 2nd April 2015. After removing duplicates, we are left with 1391 groups for which we have information on location, membership, starting date, description and topics for the group.[iii]

These 1,391 groups are based in 160 unique locations in the UK, and have a gross total of 434,826 members.[iv] 71% have been created since 2013, consistent with the idea (though not necessarily proof of) of increasing levels of meet-up activity in recent times.

A graph of the tech landscape

In aggregate terms, the groups in our list focus on 2,569 topics. We want to arrange these topics into a smaller set of “tech fields” containing inter-related topics. To do this, we follow a ‘data-driven’ approach based on scientometrics principles (the quantitative analysis of science and technology metrics e.g. academic papers and patents).

The basic idea is that topics in the same tech field will often be mentioned by the same Meetup groups.[v] For example, if the business challenge of creating value from big data requires the combination of database technologies, analytics methods and parallel processing frameworks, these topics are likely to be of interest to the same practitioners. As a consequence, we would expect to find them mentioned by the same groups, in a way that defines a ‘data’ technology field and its community of practitioners.

We visualise these associations in a “topic network”  where topics that are often mentioned together are linked and “pulled together” (see graph below).[vi] After constructing that network, we use community detection algorithms to look for densely connected “clusters” of topics inside them.[vii]  This results in the identification of six tech fields:

  • Application: includes topics representing industries and domains where digital technologies are being applied, such as startups and entrepreneurialism, social media, digital marketing, educational technology, and mobile and web design.
  • Data: includes topics related to data and analytics, such as big data, data science, predictive analytics, machine learning, open data or data mining.
  • IT systems: The topics here represent IT engineering and systems administration activities.
  • Hardware: Its topics relate to technologies and skills with a hardware component, such as 3D printing, Internet of Things or Robotics, as well as Maker communities.
  • Python: Interestingly, the community detection algorithm does not allocate the programming language Python to any of the tech fields above. This could be explained by the fact that Python is a general programming language in its own right (it has many links to topics in the Application and Software fields), but at the same time is gaining increasing popularity among data analysts and data scientists (in the Data field).
  • Software: includes topics related to general-purpose software programming languages (e.g. JavaScript) and development methodologies (e.g. agile). It also contains some generic terms like “coding” and “computer programming.”

Graph

So what does this mean?

For starters, our topic analysis reveals what appear to be distinct, meaningful tech fields, as well as some interesting relationships between them. Unsurprisingly, widely applicable programming languages and development methodologies in the Software field are most central in the map. They can after all be used in other areas (and therefore, mentioned by their groups) – this underscores the value of investing on general-purpose technologies and skills that can be applied in a variety of industries and jobs.

When we look at the relations between tech fields, we find many connections between Data and IT Systems. One potential explanation for this is the importance of IT infrastructure for the deployment of big data initiatives. The position of “Ed Tech” as a connector between the Application and Hardware fields might be explained by the importance of “Making” in tech teaching at schools.

The emergence of new tech fields

We now determine the tech specialisation of Meetup groups through the groups they mention. The approach is straightforward: if more than 50% of the topics mentioned by a group are in tech field X, then that group is deemed to specialise in field X. If no tech field is “in the majority” among a group’s topics, we place the group in a “mixed” category.[viii]

The chart below shows the historical evolution of tech fields (based on the creation of groups specialising in them).[ix]

groupentry

The chart shows the relative importance of different tech fields in the UK tech meetup landscape. The Application group, capturing practitioners who come together to focus on business applications of digital technology in various industries, is most important. This is as one would have expected, given the expanding number of industries that are becoming digitised.

This track also helps us track the emergence of new fields in Meetup. While groups in the Application field had a presence in the platform since its early days (the first UK group in the platform, “London Web”, specialising in web design and development, was created in 2002), Software appears in 2009, Data in 2010 (becoming more significant since 2011, with the explosion of interest in “big data”), and Hardware and IT Systems more recently, in 2013.

This last field is interesting in that, even though it only has started gaining a presence in Meetup from 2013, it includes older engineering topics like Continuous Delivery or Kanban. One interpretation for this is that larger corporations (and their employees) have started participating more actively in meet-ups in recent times, perhaps as a response to the fast rates of change in tech infrastructure, a wider adoption of “open innovation” practices, and more intense professional networking by people in systems administrators and IT engineering roles.

What is the geography of UK tech meetups (and the tech communities underpinning them)?

In the map below, we have mapped tech groups in the UK by tech field using the locations supplied by the group organisers. The colour of the circles, and the facets in the map, indicate the tech field.[x] The size of the circles represents the number of groups in a field in a location, after a logarithmic transformation to prevent London (which hosts 56% of the groups in our list) completely dominating the map. Below each map, we list the top cities in each field (by number of groups), sorted alphabetically.

meetupsbyfield

Although the analysis reveals some “usual suspects” one often hears mentioned in discussions about the UK “tech” economy, such as London, Bristol, Brighton, Cambridge or Edinburgh, we also see high levels of tech meetup activity in locations such as Belfast, Birmingham, or Cardiff. In general, the North of the UK has less tech meet-up activity than the South (echoing the findings in Nesta’s geography of the creative and high tech’s economy).

When looking at these maps, we need to bear in mind that the tech group counts do not consider the number of active members that a group has, nor is it normalised by population size. This means that some of the “hotspots” might simply be areas with many people and/or businesses. Going forward, we will look for ways to estimate indices of relative specialisation in the UK tech economy that control for population size.

Tracking “spikes” of activity in technology topics over time

Could we use Meetup data to track the rise of “hot new technologies” (as well as their decline)? This idea makes sense. An explosion of new Meetup groups mentioning a technology indicates interest in it, and even its adoption, by communities of practitioners or businesses.

To explore this idea, we have compared, inside each tech field, the monthly creation of new groups with topic X, with the monthly creation of new groups inside the field overall. [xi]  This results in a monthly index for each topic where scores above 1 indicate that, in that month, there was a bigger proportion of new groups with that topic than new groups in the tech field overall. An index of 10 for topic X in a month would tell us that that month was 10 times as important for the creation of groups with topic X than for the creation of groups in that tech field overall.

The chart below plots this for the top 100 topics (in terms of group mentions) in each tech field between 2010 and 2015. We have coloured in red the top 5 most “spiky” topics in each tech field for the whole period (i.e. the 5 topics which deviated the most from overall activity in the field between 2010 and 2015).[xii]

The chart shows high levels of interest in digital publishing in the Application field over 2011-2012, and on Artificial Intelligence Programming in the Data field over that same period. Some other topics represent specific technologies like Couchbase (a distributed database for unstructured data), Openstack (open source software for managing cloud computing systems) or Meteor (a platform for building web and mobile apps) that have garnered unusual levels of interest in Meetup, in terms of new group formation.[xiii]

techspikes

Conclusion and next steps

This initial exploration of Meetup data suggests that it offers a promising tool for improving our understanding of the emergence and evolution of new technology fields and communities, their interrelationships and their geography.

These are some of the things we are planning to do next:

  • Develop our geographical analysis, normalising measures of Meetup activity to calculate indices of local specialisation in different tech fields,
  • See if we can use Meetup data to measure industrial networking and knowledge spillovers in creative and tech clusters,
  • Implement more advanced methods to detect the emergence of new technologies (and allied skills) to support decision-making by policymakers and practitioners.
  • All of the above drawing on other Meetup data (e.g. on members and events) that we did not use in this post, and combining Meetup with other data sources, official as well as web.

We will also look for ways to determine what are the limitations in the data, such as its unequal coverage of tech communities in different fields and places, or the potential biases that might be introduced by Meetup’s recommendation system. In doing this we hope to gain an understanding of the tech landscape that was not possible before, in terms of its precision, its detail, and its timeliness. We will keep you posted about what we learn.

(This blog benefitted from Hasan Bakhshi and Cath Sleeman’s comments; the image comes from HackNY)


[i] It is worth pointing out that these websites also support the creation of groups that are not focused in technology, but in other shared interests and hobbies.

[ii] Meetup organisers are initially prompted to select topics that have already been used by others in the platform. They have incentives to do this since using topics already present in the platform helps being discovered by other users. There is also an option to create new topics (e.g. new technology fields as yet not present in the platform).

[iii] The reason for the duplicates is that Meetup.com’s API returns groups in the vicinity of the location being queried, and this results in overlaps (especially in cities close to London). We de-duplicated using the group ID.

[iv] This does not mean that there are 434,826 individuals involved in tech meet-ups in the UK, since this number is likely to include duplicates (individuals participating in more than one group), as well as inactive users.

[v] Meetup automatically recommends related topics and this creates the risk that the community detection analysis might be biased by such recommendations (in other words, it would be capturing Meetup’s tendency to recommend similar topics together rather than their actual similarity).

[vi] When creating the visualisation, we have focused on the top 100 topics in the list of tech groups in terms of overall mentions.  Although these topics comprise only 3.48% of the topics mentioned by the groups, they comprise 51% of all topic mentions. The distribution of topic mentions includes a small number of “popular topics’ like “programming” or “web development” and a large long tail of “niche topics” that are only mentioned by a handful of groups, or even a single group – this is the case for over half of the topics.

[vii] Different community detection algorithms have their pros and its cons (see this paper for an overview). We use an ensemble of them implemented in R’s igraph package, and benchmark them against each other with the modularity score of their proposed decomposition of the network (modularity measures the number of connections across components compared to what we would expect to find if connections where randomly distributed). To put it a different way, we look for the algorithm that divides the network into easier to separate components. Our benchmarking leads us to select the “Leading Eigenvector” method, which uses matrix algebra to decompose the network into components, and looks for those decompositions that optimise the modularity score.

[viii] We could have also identified “communities of groups” using an analogous approach to what we did with the topics (i.e. draw a graph where groups that mention similar topics are connected, and then decompose it into its components). However, this would have been more computationally intensive (it would have required considering the shared topics in almost a million combinations of groups) and harder to visualize (the resulting network would have included 1391 nodes).

[ix] One thing to note is that groups can change topics over time in ways that might shift their specialization (imagine a “Software” group that starts selecting data topics and eventually becomes specialized in that field). Since we have no information on the time when such changes took place, but only their most recent topic selection, we are not able to track them retrospectively.

[x] The geographical coordinates were not consistently supplied across groups in a given location (i.e. some times the Meetup API returns different longitude and latitude even for groups that supplied the same location) so we have averaged them out over all the groups giving the same location. Overlaps between circles simply show instances where users have provided locations close to each other, or even referring to geographies inside each other (e.g. boroughs inside London).

[xi] We divide monthly group creation by the total number of groups with that topic over the whole period in order to rescale the series, therefore preventing more popular topics drowning less frequently used (perhaps emerging) ones. We divide the monthly creation of groups in a topic by the monthly creation of all groups in the field to control for the fact that Meetup is becoming more popular in overall terms, and for seasonality in group creation.

[xii] This has the goal of excluding very infrequently appearing topics where the creation of a single new group might result in a spurious spike of activity.

[xiii] This analysis has some important limitations: in some instances, the monthly numbers of groups are small. It is likely that some of the spikes in the chart below will be random. The lack of data about changes in topics in existing groups could “smooth” the series by making it look as if topics had appeared earlier on (when those older groups were started) than it was the case. We would also like to weight the data by the number of members in different groups to get a sense of the size of the communities expressing their interest in emerging topics. We will look for ways to address these limitations going forward.