Library IT vs. the AI bots

An unprecedented attack tests the ingenuity of the University Libraries’ IT department and reveals a dark side of artificial intelligence.
June 9, 2025

Computer window graphic with 8-bit style text that reads Library It Vs. the AI Bots

by Judy Panitch

The first sign of something amiss was a cry for help.

 “I’m getting reports from staff of a catalog error: ‘This website is under heavy load (queue full).’ I saw it myself once. Seems to be intermittent.”

Translation: On Dec. 2, 2024, the University Libraries’ online catalog was receiving so much traffic that it was periodically shutting out students, faculty and staff, including the head of User Experience. Could the Library’s IT experts take action?

Heavy use of the catalog during finals week is typical as students look up books and articles for term papers and projects. This was different. 

“It was just a boatload of traffic, more than we had any reasonable expectation of getting,” recalls David Romani, a system administrator and the Library’s security liaison. Normal heavy use might involve 100 simultaneous searches. Now, internal logs showed 500 or more searches at a time, overloading the system and triggering glitches.

In many computer attacks, related internet (IP) addresses or a single internet service provider (ISP) might behave suspiciously. Administrators stop the attack by blocking those computers. The Library permanently bans more than 4 million IP addresses—most of them overseas—because of prior bad behavior. The University blocks millions more at the campus level.

What Romani found surprised him. The searches were coming from addresses spread broadly across the United States using reputable ISPs such as AT&T, Spectrum and Verizon. Each interaction looked exactly like something that happens thousands of times a day at a research library like Carolina’s.

David Romani, Tim Shearer and Jason Casden
Left to right: David Romani, Tim Shearer and Jason Casden worked with the Library’s IT team and campus colleagues to thwart bots attacking the online library catalog.

The battle escalates

The IT team was at an impasse. “We were trying to find something to hook on to,” says Casden. He started reading the queries in real time. “I opened it and watched it stream past on the screen, like in a hacker movie. The requests were flying by, thousands per minute. And we started noticing strange patterns.”

One of the first things that stood out were odd requests, such as for Finnish music. “In November, before we had this problem, we got something like 15 searches with the terms ‘Finnish’ and ‘music.’ Basically zero on the scale we operate. On December 4, alone, there were 11,329 searches from thousands of different internet addresses,” Casden recalls.

— Jason Casden

Moreover, each query was unique. Bots were selecting and combining different search options from menus within the catalog. These facets help researchers narrow results by specifying things like date or place of publication, language or location at a specific campus library. 

“A human might apply up to a half-dozen facets,” says Casden. “We were seeing requests with 15, 20, 25 facets, which is almost impossible to do, even deliberately.” 

The IT team countered, setting up rules to bounce out computers that made two highly complex queries in a row.

It worked for a week. Then the odd searches resumed, only this time 100% of them were coming directly from China.

In a way, that was good news, allowing the IT team to block entire IP address ranges all at once. In just a few days, they had banned nearly two million addresses. 

Before the team could rest, the searches resumed and intensified. 

Shearer turned to the University’s Information Technology Services, which serves the entire campus. They had never encountered an attack quite like this either, and they readily brought their security and networking teams to the table. By mid-January a powerful AI-based firewall was in place, blocking the bots while permitting legitimate searches.

An uneasy peace

While the attack has been blunted, skirmishes continue to erupt and slow service. The attacks have migrated around the globe, sometimes coming from China, sometimes from South America, sometimes from distributed networks in the U.S.

Shearer has spoken to peers around the country who have faced similar onslaughts. The consensus explanation, he says, is a vast scraping operation to build large language models (LLMs) that train generative AI programs. Large language models are massive datasets filled with human language samples. They are how Chat GPT and other AI engines “learn” to speak and write like humans. AI companies are insatiably hungry for fresh sources and invest heavily in capturing new content.

“The crawling was very sophisticated from the standpoint of evasiveness, but not from the standpoint of efficiency,” says Casden. “They were spending days and days pulling up the same catalog records through different paths. They probably didn’t even care or notice that they were being wasteful.”

Shearer describes the current situation as being “months into an arms race” and he recognizes that, while the battle is likely to go on, the University Libraries is in a better position than many others to take it on. 

“It took a team of seven people and more working almost a full week to figure out how to stop this stuff in the first instance,” says Shearer, and the University’s IT office provided invaluable reinforcement. “There are lots of institutions that do not have the dedicated and brilliant staff that we have, and a lot of them are much more vulnerable.”

Shearer also credits decades of work by the entire Library & Information Technology department for creating robust systems with multiple back-ups that protect critical functions. Despite occasional hiccups, the online catalog never went down entirely, thanks to these precautions.

Most of the time, says Romani, security and system management work is invisible, and that’s as it should be. “We know we’re doing our jobs when nobody calls. People don’t call us unless something goes wrong.”

As both generative AI and scraping techniques become more sophisticated, the IT team at the University Libraries is prepared for more challenges… and more calls.

Scroll to Top