On November 27, starting at midnight (UTC) nearly all Plus transfers from web (both EU and US regions) and all transfers from the Mac app failed to complete the upload. We released a fix for the Plus transfers by 12:15 (UTC) and a solution for the Mac App by 15:00 (UTC).
A technical summary of what happened follows. The format of the AWS S3 notifications we use to process file uploads was changed at 00:00 UTC on November 27, and the version number included in the event schema was increased. Our systems then assumed the notifications could no longer be accepted due to being in a newer format than the system can process. This led to uploads being done via POST requests (used for Plus transfers from web and uploads from the Mac App) to process for a very long time and then to fail since the POST upload processing relies on receiving an S3 notification for each uploaded chunk. Upload processing failing and retrying has led to us overloading the S3 bucket we use to store the incoming data, which made uploads via PUT requests (which do not use S3 notifications) less reliable as well.
Once we figured out why uploads were not processing, which was also witnessed by one of our internal system tests, it was quick to repair by letting the system accept the newer notification format as well.