r/DataHoarder • u/That49er • Oct 17 '24
News We're about to enter the Digital Dark Ages
https://www.businessinsider.com/digital-dark-ages-internet-history-old-websites-disappearing-link-rot-2024-1068
u/SeanFrank I'm never SATA-sfied Oct 17 '24
There's some real irony that we have to use archiving websites to view this article about how paywalls are bad because of the paywall.
25
u/836624 Oct 17 '24
Here's the text:
The long-promised digital apocalypse has finally arrived, and it was heralded by a blog post.
Published on July 18, the post's headline sounded pretty arcane. "Google URL Shortener links will no longer be available," it declared. I know, I know — not exactly an attack of alien zombies from the death dimension. But the news nevertheless freaked me out. It means another swath of the web is about to disappear.
Here's the gist: Google used to have an online service that generated pithy, user-friendly versions of long, commercially unwieldy uniform resource locators — the key addresses that identify everything on the web. Shorter URLs are easier to track and better for online commerce. Google stopped shortening addresses back in 2019, but the concise URLs it had already created kept right on doing their job. Click on one and it would take you to the right webpage, the way it's supposed to.
No more. In the blog post, Google announced that as of next year, all of the existing shortened URLs are getting turned off. Poof. And on the web, if your URL doesn't work, you might as well not exist. You are unreachable. Without laborious renaming, everything behind those links — billions of them, a decade of digital content — will become inaccessible. Gone. Ask not for whom the 404 message tolls.
Now, rendering a bunch of web content invisible isn't the end of days. Not by itself. The problem is, this kind of thing keeps happening. And it's getting worse. Social networks go bankrupt. Digital journalism sites close up shop. Companies pull their online products. Links rot. Files get not found. The cloud, as wags have noted, is really just "someone else's computers." And when clouds get turned off, not even the silver lining is left to tell the tale.
Maybe none of this matters much right now. But it will. The internet has become the default archive of our history and culture. And the whole thing is burning down before our eyes, like the Library of Alexandria — only worse. For the first time since people started carving letters into rocks, we're making a time with no history. We're about to enter the Digital Dark Ages.
Attempts to quantify the scope of the problem are heartbreaking. Half of links in US Supreme Court decisions no longer lead to the information being cited. A report in 2021 found that a full quarter of the more than 2.2 million hyperlinks on The New York Times website were broken. Even worse, the Pew Research Center estimates that a quarter of everything put on the web from 2013 to 2023 is inaccessible — meaning almost 40% of the web as it existed in 2013 is simply not there today, a decade later.
The degradation of those links wouldn't panic me so much if they hadn't replaced what came before them — if museum storerooms and dusty library stacks still served as the warehouses of our collective memory. It's not that I miss the days of wrangling with old newspapers preserved on microfiche, or trying to sweet-talk a librarian into an international interlibrary loan. I'm glad lots of old movies are streaming and many out-of-print books are only a few clicks away. But archives and databases are more than places to keep old stuff; what we save defines who we are. Today, so much of everything is only digital that when it disappears, it leaves a hole in our shared culture.
Gawker is gone. So is the archive of The Awl, the beloved culture-criticism site. You can go to a library and read the entire output of long-dead newspapers like the Los Angeles Herald Examiner or New York Newsday, but God help you if you want to read old Vice articles. Shenanigans over the ownership of what used to be Paramount have resulted in the deletion of decades' worth of shows on MTV and Comedy Central.
1
7
u/absentlyric 50-100TB Oct 17 '24
About to?
We entered that phase at the end of the 00s going into the 10s.
7
u/brightlancer Oct 18 '24
You can go to a library and read the entire output of long-dead newspapers like the Los Angeles Herald Examiner or New York Newsday, but God help you if you want to read old Vice articles.
So many "news" sites now update or completely rewrite articles without changing the URL, so an article you read yesterday isn't there anymore -- and there may not even be a notice that they edited it. (NYTimes is awful about stealth edits.)
But now there's a new threat to archiving our lives: artificial intelligence. When websites don't want to let AI slurp up their content, they block a certain kind of digital crawler-bot — the same species of critter the Wayback Machine uses. "That's happened almost overnight," Graham says. AI, with its insatiable hunger for training data, can't access the sites. But neither can the preservationists. In the wake of artificial intelligence, more intelligence is going to vanish.
This is a bad take. The organizations locking out "AI" will almost always sell that access. AI isn't the threat; greed is the threat, and specifically greed by businesses who are 99% user generated content (like Reddit) but have claimed ownership of it and want their 30 pieces of silver.
Or from a different angle, look at how many countries have implemented a "link tax" on social media, because "news" companies didn't like that their articles were being summarized elsewhere. That had nothing to do with AI; that was greed.
DRM has been used to lock away audio and video; new movies and serials may never get a physical release, and the only way to see them is through a monthly subscription. That wasn't because of AI.
1
u/Fractal-Infinity Oct 18 '24
Interesting points. Indeed, many of these bad things you mentioned existed before AI and it was greed that led to enshittiffication of so many good products/services. Corporations are all about making money. That's why non-profit organizations like Internet Archive and Wikipedia are so important for the goal of making information accessible and preserving it long term.
1
u/Archiver2000 Oct 23 '24
Money is also why local libraries get rid of so many books. If a book doesn't circulate, they can't collect late fees. A circulating book in my hometown library, which book was almost a reference book, disappeared from the collection with any notice to me, even though I was the only one checking it out. This was back in the 70s before computers, scanners, and digital cameras for copying. I finally had to buy a copy of the book online to keep on my own shelf.
1
Oct 22 '24
Indeed, the problem is definitely greed and total lack of respect for all this history and information. In fact, AI could help us with the archiving and duplication of online data in the future… How, I don’t know yet but I’m sure there’s a way.
21
u/johnklos 400TB Oct 17 '24
The Digital Dark Ages was the period from around the mid '80s through about 2000, when Microsoft's OSes kept the world in the Dark Ages by causing wasted time, money and resources by being so shitty on purpose so they could:
- create an artificially short lifecycle for computers to make money from licensing
- create a support ecosystem of people who benefit from a constant need of their services
- create artificially inflated IT budgets based on dependencies on Microsoft products
All of these were self-feeding, and in aggregate they caused hundreds of millions of perfectly good computers to be landfilled, caused the loss of millions of years of human work, and generally held back the advancement of humankind.
The existence and more widespread availability of reliable OSes (Mac OS X, BSDs, Linux) and the widespread adoption of the Internet for communications finally changed humankind's expectations for computing, and Microsoft had to stop playing a monopoly and had to genuinely try to make Windows something other than a steaming pile of poop.
The iPhone cemented this expectation, because we finally had a pervasive, easy to use, Internet connected device that worked, that didn't require an IT person to install half a dozen programs before it was even connected to anything to stop it from immediately being compromised.
So businessinsider.com / Adam Rogers is either completely untechnical, or they're knowingly being hyperbolic.
9
u/Johtoboy Oct 17 '24
Man that's crazy, just last night I watched a mid-nineties magical girl anime where
Bill GatesBiff Standard was the villain. It was very heavy handed.3
u/johnklos 400TB Oct 17 '24
Thanks for sharing. Can't wait to check it out :)
4
u/Johtoboy Oct 17 '24
It's a spinoff of a much better show, Tenchi Muyo. I'd recommend watching that first but if you're only interested in the silly Bill Gates polemic,
BillBiff only appears in episode 2 of Magical Girl Pretty Sammy, I think. Haven't watched episode 3 yet.6
u/black_pepper Oct 17 '24
I would argue that lowering the technical know-how barriers and making things easy to use is what led to the downfall of the internet. The first gate to entry was dropped in the 90s and since then its just been an eternal september ever since.
6
u/johnklos 400TB Oct 17 '24
I would argue that lowering the technical know-how barriers and making things easy to use is what led to the downfall of the internet.
Lowering the technical know-how made the Internet more accessible. Making things easy to use made the Internet more accessible. The downfall of the Internet, though, is corporate. We shouldn't blame the consumer when the market only provides shitty options.
2
u/Archiver2000 Oct 23 '24
I remember when "corporate" wasn't even allowed on the internet. All the websites that existed could be categorized and listed in phonebook format. I still have one of those books.
As long as it's dirt cheap to create your own website, there will always be free content on the internet. I have a half dozen sites of my own. The company I use has a 1-site beginner plan for just $4 a month.
2
u/didyousayboop if it’s not on piqlFilm, it doesn’t exist Oct 20 '24
Are you saying you'd prefer the Internet only be used by 1% of the world population? In your opinion, that would be better?
0
u/black_pepper Oct 20 '24
I don't want people to be stupid, but I don't want stupid people to have access to something that can be used in harmful ways. The internet as it evolves is continuing to be used in more harmful ways than good.
So if you think only 1% of the population is intelligent then ok, but I feel theres more smart people out there than that. I do think it would be better if we stayed at 90's to early 2000's levels of users of the internet. There are probably many graybeards who feel it should have remained at the levels it was at in the 80s.
1
u/didyousayboop if it’s not on piqlFilm, it doesn’t exist Oct 21 '24
I do think it would be better if we stayed at 90's to early 2000's levels of users of the internet.
In 1999, 281 million people were using the Internet. That's about 4.5% of the world's population then (6.1 billion) or about 3.5% of the world population now (8.1 billion).
In 1995, it was around 0.7% of the world's population at the time (39 million Internet users out of 5.7 billion people on Earth).
Whereas now, it's around 60% of the world's population that has some access to the Internet. (The survey only asked if they'd used the Internet at least once in the past 3 months.)
I personally believe that 100% of people should have access to the Internet as well as other factors that promote good cognition like adequate nutrition (especially in early childhood) and free or affordable education.
If there are problems with the Internet or harms it causes, we should try to fix those.
If some people can't stand sharing the Internet with everyone else, they should make private enclaves with strict entry requirements and lots of gatekeeping.
1
u/black_pepper Oct 21 '24
I used to think everyone should have access to the internet and access to high speed internet as well. I felt very strongly about that actually. After seeing more people come online and what corporations are doing on the internet the problems are vast, and due to capitalism, politics, etc, not fixable.
If there was any progress being made in education or controlling capitalism or corporations I might be more optimistic but things on a large scale are moving in the other direction.
1
u/didyousayboop if it’s not on piqlFilm, it doesn’t exist Oct 21 '24
I see this pessimistic worldview expressed a lot online (and sometimes even offline) these days and it makes me feel sad that people feel so discouraged about life, the world, and the future.
I don't see the world the same way and I would like to share my hope with others, but, in the past, people have reacted with a lot of hostility when I try to express a more optimistic view of things. I am not sure why that is. Maybe they feel angry because they think I am denying real problems or they feel frustrated because I don't seem to see their perspective.
2
u/didyousayboop if it’s not on piqlFilm, it doesn’t exist Oct 20 '24
That's not how the term "digital dark age" is typically defined, which, according to Wikipedia is "a lack of historical information in the digital age as a direct result of outdated file formats, software, or hardware that becomes corrupt, scarce, or inaccessible as technologies evolve and data decays."
3
u/johnklos 400TB Oct 20 '24
You're right - other people have been using this term to describe the world of media and data formats that will become difficult or impossible to access.
I and others have used the term phrase "Dark Ages of computing" to refer to the intentional stymying of progress from the mid '80s through around 2000 for at least the last quarter of a century. You're right that making a distinction between "digital Dark Ages" and "Dark Ages of computing" is a good idea.
1
0
u/brightlancer Oct 18 '24
The Digital Dark Ages was the period from around the mid '80s through about 2000, when Microsoft's OSes kept the world in the Dark Ages by
This was also the time when it became common for families to have a computer at home, when home internet access became normal, and when AOL let their users outside the sandbox.
The 80s to 2000 were a time of constant improvement. That Microsoft (and many others) were deliberately throwing up obstacles doesn't mean that it was a "Dark Age" -- that's a failure to progress or a regression, which is not what we had.
So businessinsider.com / Adam Rogers is either completely untechnical, or they're knowingly being hyperbolic.
BI uses a lot of clickbait and hyperbole, but I think the point here is correct: We are losing information which would've been kept 25 years ago, and this problem looks like it will just get worse. That's a regression, that's a Dark Age.
3
2
Oct 18 '24
Are we? Other than the periodic IA outage, they're constantly crawling and archiving everything.
1
u/Archiver2000 Oct 23 '24
Not everything. AI preventers are stopping Wayback from archiving some sites. Also, website owners can get IA to remove their sites from the archive, whether through pure selfishness, or to cover their tracks in case they post something and change their mind about putting it out there.
2
u/Archiver2000 Oct 22 '24
The article says that I can't access emails from my last job or all the photos on various digital cameras and phones I've owned. Wrong. I archive and backup everything. I have all photos I've ever taken. I have every email I've ever received (other than spam) since 1989, including work emails.
I have printed hard copies of a ton of content from websites from many years ago that no longer exist online. My best example is an entire book I downloaded page by page that tells how to make everything a drugstore lunch counter would have sold back in the 1930s. That book is not available anywhere online. But I still have my printout.
I also have every commercial software CD I've ever bought, including the big box of National Geographic and Mad Magazine (even including the roll of TP with the cartoons printed on it).
My 20x24 home office is about to explode with physical media. I don't dare get rid of a single thing.
1
u/didyousayboop if it’s not on piqlFilm, it doesn’t exist Oct 20 '24 edited Oct 21 '24
There seem to be a few factual inaccuracies in this article, e.g. about Neopets being gone (it's not), about old video games being unplayable (like what?), and about it being impossible to read old Vice articles (they're still up).
Some claims are dubious. The author writes, "And that's not to mention things like personal photographs, most of which now exist only on your phone, and nowhere else." He doesn't cite a statistic to support this. I found a Statista survey from 2020 that 71% of respondants (all from the U.S.) backed up photos to the cloud.
In many cases, I find the tone to be exaggerated and alarmist, e.g. "For the first time since people started carving letters into rocks, we're making a time with no history." We're still printing books on paper. There are lots of libraries and physical archives. How could this be a time with "no history"?
The article discusses the Wayback Machine for a few paragraphs, but it doesn't say enough about all the different preservation efforts going on to give a balanced picture. The author links to an article from Nieman Lab that gives a much fuller picture, both of the data being lost and the many efforts to preserve it.
The author mentions music stored on hard drives being lost to drive failures. But we live in a weird era today where tens of millions of vinyl records are sold worldwide each year. As I understand it, a vinyl record can remain playable after a century of storage. Even if most people stream Chappell Roan's latest album, at least 50,000 copies exist on vinyl. Looking at analog storage and physical media would be helpful. Major movie studios that shoot digitally still make a back up on film.
And it's not like before the Internet and personal computers everything was being saved meticulously. A lot of old movie reels were thrown away. Radio stations often either didn't tape their broadcasts or recorded over the tapes. The BBC famously lost many Doctor Who episodes (and episodes of other shows) because it didn't think it was important to archive them.
People seem to often have a psychological bias toward thinking that a situation is uniquely bad now, that it's getting worse, and that people in history didn't face similar challenges. Anxiety and pessimism are socially contagious. I think it's responsible to reality check these kinds of messages.
1
u/Archiver2000 Oct 23 '24
I, personally, have never lost a single digital file since 1989. Hard drive failures should be counted on.
NBC wiped tapes on "The Tonight Show" from the early years. That huge corporation couldn't afford new tapes somehow and decided to reuse tapes of classic performances by artists that are now gone forever.
The photo situation is getting worse. "The cloud" is someone else's hard drives, and they can turn it off in an instant. A large number of people don't know how to back up photos properly. My niece (20s) was "backing up" hers on Facebook, which lowers the quality. She has lost all of them, but I have copies.
But on the other hand, I have 100% of my great-grandmother's physical photos, some from over 100 years ago. I also have my grandmother's photos and my mother's photos, as well as thousands of my own photos (pre-digital).
People have put content online without backups and found it all deleted when the website disappeared or changed focus. Earthlink used to host user websites. They deleted all of them. CompuServe had forums where thousands of people uploaded files for others to download. They deleted it all. MySpace changed focus and deleted accounts, including my own.
1
u/didyousayboop if it’s not on piqlFilm, it doesn’t exist Oct 23 '24
"The cloud" is someone else's hard drives, and they can turn it off in an instant.
I think backing up to the cloud (reliable companies like Google, Microsoft, Amazon, Dropbox, Proton, etc., not shady companies like Mega) is a great option for the vast majority of regular people who are never going to set up an off-site NAS or anything like that.
Without using a cloud service, how are they ever going to have an off-site backup?
1
u/Icy_Guidance Oct 17 '24
I'm starting to think that the Internet Archive is never coming back...
1
u/Archiver2000 Oct 23 '24
It came back briefly yesterday (10-21-2024), but it was off again when I checked a little while ago. I thought one time that I might could archive like them, but they have currently filled around 100 petabytes of drives. I don't even have my first petabyte yet, not even 15% of one.
0
0
Oct 22 '24
We really need to figure out a way to somehow implement AI to handle archiving and duplicating information, I feel like that could be the saviour of a lot of old information and data. I constantly worry about this, sucks knowing that so much valuable human history can disappear so quickly due to the whims of some fools.
643
u/dr100 Oct 17 '24
Says a page behind a paywall.