The Web Is Disappearing: Is It It's Fault?

SERVERS

Link rot afflicts many websites, even as the tools to manage the issue go unused.

Credit: Melnikov Dmitriy / Shutterstock

Nothing lives forever, and researchers have confirmed that web pages are no exception. They pop into existence at one moment in time and have a habit of disappearing with an abrupt "404 not found" at an unknown point in the future.

The rate at which this happens has a name: "digital decay", or "link rot". According to an analysis by the Pew Research Center, When Online Content Disappears, we can even put some numbers on the phenomenon.

Looking at a random sample of web pages that existed in 2013, the researchers found that by 2023, 38% had disappeared. If it doesn't sound surprising that nearly four in ten web pages from 2013 would have disappeared a decade later, they did the same analysis for pages that appeared in 2023 itself, finding that a surprising 8% disappeared by the year end.

But what matters is not simply how many web pages have disappeared but where they disappeared from. On that score, 23% of news pages and 21% of pages on US government sites contained at least one broken link.

The most interesting barometer of all for link rot is Wikipedia, a site which depends heavily on referenced links to external information sources.

Despite the importance of references, the researchers found that at least one link was broken on 54% of a sample 50,000 English language Wikipedia entries. From the total of one million references on those pages, 11% of the links were no longer accessible.

Disappearing tweets

And it's not just links. Looking at that other cultural reference point, "tweets" on the X (formerly Twitter) platform, a similar pattern was evident. From a representative sample of 5 million tweets posted between 8 March and 27 April 2023, the team found that by 15 June 18% had disappeared. And that figure could get a lot higher if the company ever stops redirecting URLs from its historic twitter.com domain name.

Some languages were more affected by disappearing tweets than others, with the rate for English language tweets being 20% and for those in Arabic and Turkish an extraordinary 42% and 49%, respectively.

Pew is not the first to look into the issue. In 2021, an analysis by the Harvard Law School of 2,283,445 links insideNew York Timesarticles found that of the 72% that were deep links (i.e., pointing to a specific article rather than a homepage), 25% were inaccessible.

As a website that's been in existence since 1996, TheNew York Timesis a good measure of long-term link rot. Not surprisingly, the further back in time you went, the more rot was evident, with 72% of links dating to 1998 and 42% from 2008 no longer accessible.

This study also looked at content drift, that is the extent to which a page is accessible but has changed over time, sometimes dramatically, from its original form. On that score, 13% of a sample 4,500 pages published in theNew York Timeshad drifted significantly since they'd first been published.

Where is IT going wrong?

Does any of this matter? One could argue that web pages disappearing or changing is inevitable even if not many people notice or care.

While the Pew researchers offer no judgement, the authors of the Harvard Law School study point out the problems link rot leaves in its wake:

"The fragility of the web poses an issue for any area of work or interest that is reliant on written records. [...] More fundamentally, it leaves articles from decades past as shells of their former selves, cut off from their original sourcing and context."

According to Mark Stockley, an experienced content management systems (CMS) and web admin who now works as a cybersecurity evangelist for security company Malwarebytes, while some link loss was inevitable, the scale of the issue suggested deeper administrative failures.

"People seem to be more ambivalent about losing pages than they used to be. When I first started working on the web, losing a page, or at least a URL, was anathema. If you didn't need a page any more you at least replaced it with a redirect to a suitable alternative, to ensure there were no dead ends," said Stockley.

"What's baffling is when CMSs don't pick up the slack. While some CMSs will catch mistakes and backfill URL changes with redirects automatically, there are others that, inexplicably, don't. It's an obvious and easy way to prevent a particular kind of link rot, and it's baffling that it exists in 2024," he said.

Alternatively, if the CMS doesn't include a link checking facility, admins can also deploy link checking tools that will crawl a site to find broken links.

For CMS admins, spotting and correcting broken links should be a defined process not an afterthought.

Anyone who wants more detail on the methodology behind When Online Content Disappears can follow this link (PDF).

Prev:My 5 must-have extensions for Firefox on Android (and what I use them for)

Next:Dell Technologies and Ericsson Form Strategic Partnership to Accelerate Telecom Network Cloud Transformation

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVERS

HOT NEWS

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

CloudEngine S6700 Series Switches Naming Conventions & Description

Huawei CloudEngine S6730-H24X6C Datasheet

Huawei S6730 Series Switches Datasheet

Huawei CloudEngine Switch S6730-H48X6C Datasheet

Introduction to the Huawei CloudEngine S6730-S Series Switches

Huawei S6730-H48X6CZ-V2: The Ultimate High-Speed Network Switch

Overview of the S6730-H28X6CZ-V2 Switch

Huawei CloudEngine S6730-H24X4Y4C: A High-Performance Enterprise Switch for Modern Networks

​Introduction to Huawei CloudEngine S6730-H Series Switches

Comprehensive Guide to the CloudEngine S6730-H24X6C-V2: Features, Specifications, and Applications

Huawei S6730-S24X6Q: Advanced Ethernet Switch for Modern Networks

Comprehensive Guide to the S6730-H48X6C-V2 High-Performance Switch

Huawei CloudEngine S6730-H28Y4C: High-Performance Switch for Modern Networks

Overview of the S6730-H24X6C-V2

Unveiling the Huawei CloudEngine S6730 Series: Advanced Switching for Modern Networks

Huawei S6730-H48X6C: A Comprehensive Overview

Comprehensive Guide to Huawei S6730-H24X6C

Huawei Switches Visio Stencils

The web is disappearing: Is it IT's fault?

Link rot afflicts many websites, even as the tools to manage the issue go unused.

Disappearing tweets

Where is IT going wrong?

Hot Tags : Content Management Systems Web Search

Ordering Guide

Resources

About Us

Introduction to Huawei CloudEngine S6730-H Series Switches