But most of this information is filtered down to about 25 petabytes per year of data that the physicists consider worthy of analysis.
However, as well as the raw data from the four main detectors, Cern also has some impressive infrastructure just to control and operate the LHC, including a database of sensor information and the status of all the magnets and moving parts that is currently growing by 50TB per year.
Speaking at NetApp's Agile Data Infrastructure Summit in London, Cern database services architect Eric Grancher gave some insight into the technology the organisation is using for this purpose, and the challenges it faces in operating the largest machine in the world 24 hours a day.
"The accelerator is running continuously for a good fraction of the year, and everything has to work. If the database which manages our sensor data is not up, then everything stops, so there are no maintenance windows," he said.
The infrastructure actually used to control and monitor the LHC is surprisingly familiar, which Grancher said is because Cern is often staffed by researchers who serve a relatively brief stint before returning to industry or other institutes, and so they need to get to grips with the technology as quickly as possible.
Cern's current platform is based on Red Hat Enterprise Linux 5 and Oracle VM Server for x86, running atop a mixture of Xeon-based Dell PowerEdge M610 and R810 servers, with HP Procurve E8206zl Ethernet switches.
Storage is based on a mixture of NetApp hardware, comprising 3040, 3140, 3240 and 6240 arrays, with dozens of databases driven by Oracle 11g with Real Application Cluster (RAC) support.
The NetApp arrays are largely filled with Sata disks combined with PCI Express Flash Cache cards to deliver the right level of performance, according to Grancher.
"We started, probably like everybody, using Fibre Channel disks or SAS disks, because for databases we have to have fast I/O operations. But people came to us and said they expected to store hundreds of terabytes, and Fibre Channel disks at the time were 140GB, so they just weren't viable because of the space constraints and also because of the cost," he explained.
For this reason, Cern worked with NetApp to look at Flash Cache, and very early on decided to adopt Sata with Flash Cache, switching from mirrored Fibre Channel drives to Direct NFS on NetApp Sata storage with Flash Cache earlier this year.
Daniel Robinson is technology editor at V3, and has been working as a technology journalist for over two decades. Dan has served on a number of publications including PC Direct and enterprise news publication IT Week. Areas of coverage include desktops, laptops, smartphones, enterprise mobility, storage, networks, servers, microprocessors, virtualisation and cloud computing.