![](https://crypto4nerd.com/wp-content/uploads/2023/02/0MMKsP741IeBsB4Fz.png)
Navigation patterns are often of great interest as ease of navigation will bump a website up in search rankings, and the importance of this can’t be underestimated given the rising costs of another major driver in paid search, ad costs. Navigation patterns are also important as you’d want to ensure that the user journey often leads to conversion, which is typically a purchase. In Analytics parlance, users navigate around ‘events’ on the site, and events can be defined to varying degrees of granularity depending on the company. Events are also what constitute the different stages of a conversion funnel.
We will now perform some simple aggregations to detect conversion funnels. The dataset we will use is an e-commerce dataset from a retailer selling pregnant women’s clothing. This dataset provides a high level of granularity, displaying all the purchases and the order in which they were made in a session. Though the only navigation events shown in this dataset were purchases, we will imagine the purchases of these clothing items to be different events on a website like LL Bean’s such as clicking on:
Clothing Men’s Adults’ -> L.L. Bean Bandana -> Add to Bag -> Checkout
In our dataset, the events, limited to orders of clothes, are denoted by letters and numbers like A3, C4 and B10. The reason that conventional full clickstream data isn’t being used here is that it’s notoriously difficult to get hold of it, especially due to the importance of consumer privacy.
If we take B10 to be our conversion event, then we’d want to find the most frequent sequences of events leading to B10. The chunk of code below, where ‘page 2 (clothing model)_y’ refers to the variable for online navigation events, shows a simple way to find the most frequent sequence(s) of events leading to conversion.
In our dataset, the events, limited to orders of clothes, are denoted by letters and numbers like A3, C4 and B10. The reason that conventional full clickstream data isn’t being used here is that it’s notoriously difficult to get hold of it, especially due to the importance of consumer privacy.
If we take B10 to be our conversion event, then we’d want to find the most frequent sequences of events leading to B10. The chunk of code below shows a simple way to find the most frequent sequence(s) of events leading to conversion.
The event column here is page 2 (clothing model)_y, and one finds the first lagged event through the shift(1) function, and so and so forth through further lags. Once we have the 3 lags for the same person lined up in a row for each row where B10 occurs, we’re able to find the total number of rows in the whole dataset per combination of lag1, lag2 and lag3 where the event (before creating the lags) is B10. If we want to look at only common sequences of events leading up to B10, we can filter to only sequences that occur with a minimum frequency, and >10 was chosen here. In a dataset with real clickstream events, you’d expect the most popular conversion funnels to take up a much larger percentage of the total sequences. Here, there’s only one sequence of events meeting that frequency threshold and that’s B7 -> B8 -> B9 -> B10 in chronological order.
The rationale for finding the most common route(s) to conversion is so that you can focus on UX enhancements on that route. For example, if a certain button to click on is on the conversion route, you could A/B test different sizes or colors of the button to see if an even larger percentage of people move on to the next event in the funnel, and eventually, conversion.
Let’s extend our findings by trying to find the events that lead up to the start of this funnel, B7. Hence, one can treat B7 as a secondary ‘conversion’ point and try to find funnels leading to the ultimate funnel previously found.
Similarly, one can try to pay extra attention to events along this secondary route too by trying to prioritize the fixing of any rage clicking along this route as an example.
So far, we’ve only found events that co-occur within the same person. We’ve found frequent journeys that users make. However, what if we were to make some assumptions to suggest new potential journeys that the UX team could try to facilitate more seamlessly via more obvious button links from one event to another? Take the case of travel for example: if many vacationers who fly to Spain also like flying to Greece, and many who fly to Greece also like flying to Italy, but so far, you don’t see many travel package deals from Spain to Greece to Italy or Spain to Italy. One then starts wondering at the potential to get many more vacationers who’ve already traveled from Spain to Greece to also travel to Italy. The assumption made is that the vacationers from Spain to Greece share some similarity with vacationers from Greece to Italy through the shared interest in Greece, so perhaps you’d try to get more vacationers to fly directly from Spain to Italy, perhaps by providing more direct flights from Spain to Italy.
Let’s take a look at the network graph generated for the purpose of finding new potential funnels. This graph was created below via pairing each event with its first lagged event within a person to create a tuple for each row. A list of tuples is thus created and this list of tuples forms the basis of the edge list, which, along with a list of nodes/vertices, is used to generate the network graph below. This is a directed graph, such that a node pointing to another note means that the node precedes the node it’s pointing to. Only cooccurrences with a frequency over 50 were chosen, and, as expected, B10, the conversion event, was an eligible node by this criterion.
In the previous section where we detected funnels that were already common, we found only B7 -> B8 -> B9 -> B10. However, this graph network reveals that the most common events just prior to B10 are not B7, B8 or B9. Instead, we see A2 as an example of a frequent precursor to B10. Just focusing on opening up potential routes to B10 via A2 alone, we have A1 -> A4 -> A2 -> B10, and P1 -> A2 -> B10 as an example. There are not many frequent outgoing links from P1, and thus, to really maximize the chance that someone will reach B10 from P1, we’d want to ensure that the retention from P1 to A2 is high, perhaps by adding a very visible link from P1 to A2. All these experiments with additional visibility of links or buttons and other UX features should, of course, be A/B tested where possible.
What else can we gather from such graph networks? A lot more, even without close inspection of these graphs, which can often be cluttered. There are plenty of influence metrics that one can calculate using libraries, including igraph. Some common ones are:
- degree centrality: how many links each node has
- closeness centrality: the average path length to all other nodes
- betweenness centrality: the number of times a node lies on the shortest path between other nodes
How is each important here?
Degree centrality tells you how well-connected the link is. Both inbound and outbound links are important for all nodes. Even outbound links from the conversion event B10 can be important as it could signal the potential for further purchases back to the checkout cart (to buy a different item).
Without even calculating anything and just through visual inspection of the graph network, it may be obvious that B4 has the highest degree centrality. Not only that, it has the highest number of inbound links and only two outbound ones to A1 and B10. B4 can thus be likened to a funnel with a large catchment area at the top that is extremely targeted at leading to conversion. To find more key conversion events in the funnel like B4 in a more automated way, simply find all events that are just preceding the conversion event, and take into consideration both the number of inbound links to them as well as the ratio of the inbound:outbound links.
Closeness centrality is a good proxy for how fast one navigates from one node to other nodes. Since it’s really only the length of the path to the conversion event that really matters, this may not be such a useful metric.
Betweenness centrality often tells you how much of a bridge a node is between two relatively isolated ‘islands’ of nodes. The dropping of this node or the brokenness of this link (which you might be able to diagnose through detecting signs of rage clicking) might cause people from one ‘island’ to never or rarely reach the other ‘island’ that contains the conversion event.
When we look at this graph network, there are quite a few events like A7, A8 and A12 that should be inspected as potential bridges between other events outside this graph network and this group of events here, which have the greatest potential for leading up to B10. You can run simple queries to find out if there are still many links from other events to A7, A8 and A12 even if they didn’t hit the frequency threshold of 50. If so, it’s important to further strengthen the connections between those other events that lead to A7, A8 and A12.