Sharing 500 MB of JSON in the Browser: Resumable Uploads and Gzip Streaming

The Share feature in Big JSON Viewer takes the JSON you’re looking at, gzips it in the browser, uploads it to your own Google Drive as a resumable upload, makes it anyone-with-link, and hands you back a URL. On the receiving end, opening the URL streams the gzipped bytes back, decodes them on the fly, and renders the result. The whole pipeline has to work on 500 MB JSON, has to show real progress, has to be cancellable, and has to survive a network hiccup gracefully.

There’s a fair amount of subtle stuff in there. This walks through it.

Why Drive’s multipart endpoint isn’t enough

The simple way to upload to Drive is the multipart endpoint: one POST with the metadata and file body in the same request. It maxes out at 5 MB. We started there. It worked. Then we tried sharing a 100 MB JSON and it stopped working.

The Drive API has a second endpoint, the resumable upload protocol. You POST ?uploadType=resumable with the metadata and no body; Drive responds with a Location header containing a one-time upload URL. You then PUT the file body to that URL, optionally in chunks, with Content-Range headers telling Drive which bytes you’re sending. Resumable upload handles any file size Drive supports (5 TB).

Chunk size and the 256 KB rule

Drive’s resumable protocol has a specific constraint: chunks have to be a multiple of 256 KB, except for the last chunk (which can be any size to finish the file). We picked 8 MB, which is 32 × 256 KB and divides 500 MB into 63 chunks. That number matters:

  • Smaller chunks (say 1 MB) would give smoother progress updates but 8× the HTTP overhead. At 500 MB and 1 MB chunks, that’s 500 PUT requests; a transient network hiccup is much more likely.
  • Larger chunks (say 64 MB) would give chunkier progress updates and bigger retry costs when a single chunk fails.
  • 8 MB is the sweet spot we landed on. 63 progress ticks for a 500 MB file is one tick every ~1.6%, which paints smoothly. Each chunk is fast enough to retry by hand.

The Content-Range off-by-one

The chunk header has the format Content-Range: bytes {start}-{end}/{total} with the end being inclusive. So an 8 MB chunk starting at byte 0 sends bytes 0-8388607/524288000, not 0-8388608. Off by one and Drive returns a 400 with an opaque body. Worth pointing out: this is one of those bugs you only catch when you actually run it, because the spec text uses “end byte” and most readers naturally interpret that as exclusive.

Drive’s response on a non-final chunk is 308 Resume Incomplete. The final chunk gets back 200 with the file metadata as JSON in the body. We parse that body to get the file id, then issue a permissions API call to make it anyone-with-link.

Gzip on the sender, transparent decode on the receiver

500 MB of pretty-printed JSON compresses to roughly 30-80 MB with gzip, often less. That’s a 6-15× upload time reduction for free. The cost is the CPU time of gzip itself, but browsers ship a native streaming gzip via CompressionStream:

async function gzipString(text: string): Promise<Blob> {
    const source = new Blob([text]).stream();
    const compressed = source.pipeThrough(new CompressionStream('gzip'));
    return new Blob([await new Response(compressed).blob()], {
        type: 'application/gzip',
    });
}

We mark the uploaded file with mime type application/gzip and a custom appProperties.bjv_compressed='gzip' metadata flag. The filename gets a .json.gz suffix.

On the recipient side, we sniff the response Content-Type. If it includes “gzip,” we pipe the response body through a DecompressionStream on the way to the reader. Old (uncompressed) shares predate this change and come back as application/json; they bypass the decode and work unchanged. Zero migration: the recipient code reads the header and does the right thing either way.

Progress that matches the wire

Here’s a subtle UX bug we almost shipped. The straightforward way to track download progress is response.body.getReader() and a counter incremented on each read(). But if you pipe through DecompressionStream first, the reader sees decompressed bytes — not the bytes on the wire. A 100 MB download might show as 500 MB received because that’s the decompressed size.

The fix is to put the counter before decompression using a TransformStream tap:

const progressTap = new TransformStream<Uint8Array, Uint8Array>({
    transform(chunk, controller) {
        bytesReceived += chunk.byteLength;
        onProgress?.({ bytesReceived, bytesTotal });
        controller.enqueue(chunk);
    },
});

let stream = resp.body.pipeThrough(progressTap);
if (isGzip) stream = stream.pipeThrough(new DecompressionStream('gzip'));

The counter accumulates compressed bytes, matching Content-Length exactly. The reader downstream gets clean decompressed bytes for the JSON parser. Two purposes, one stream.

Cancellation

Every fetch in the upload and download paths takes an AbortSignal. On the sender side, pressing Cancel during upload aborts the in-flight chunk PUT and then fires a best-effort DELETE on the upload URL to tell Drive to clean up the half-uploaded session. Drive would garbage-collect it after a week anyway, but explicit cleanup is cheap and feels right.

On the recipient side, abort comes from the effect cleanup: if the user navigates away mid-download, the React effect unmounts and we call controller.abort(). The fetch reject is caught and translated to a DriveError('ABORTED') that the loader recognizes as “not a real failure” and silently drops without showing an error state or firing an analytics event. The user left; we don’t need to complain about it.

What we explicitly didn’t build

Cross-session resume. Drive’s upload URLs are valid for a week, so technically you could persist the URL and resume after a page refresh. Doing so would require storing an OAuth token in localStorage to authenticate the resumed PUT requests — which violates our “token only in memory” rule and adds a significant privacy footprint. Refresh the page mid-upload and you start over. Most users either don’t refresh or recover gracefully; the few who lose work to this can retry. We pick the privacy model.

Auto-retry on chunk failure. If a PUT returns 5xx, we surface the error and let the user press Retry. Silent retries hide bad network conditions; explicit retries make them visible. For a high-stakes flow like sharing, “something failed, you can see exactly what and choose what to do” is the right default.

Server-side streaming. All of this runs in the browser. We don’t have a server in the loop, so we can’t fan out chunks in parallel through a proxy or pre-warm a TLS connection. Single-threaded sequential PUTs at 60-80 MB/s on a fast connection is fast enough that the gzip step usually dominates total time anyway.

The shape of the final pipeline

Putting it all together — from drag-and-drop to share link in the user’s clipboard:

  1. User clicks Share. Modal opens; we set a flag and start the prepare step in requestIdleCallback.
  2. JSON.stringify(root) on the main thread. For a 500 MB file this takes 3-8 seconds and freezes the UI. We show “Preparing JSON…” spinner.
  3. gzipString(text) via CompressionStream. Another 3-8 seconds. Compressed Blob is held in memory.
  4. User clicks “Share & get link”. We open a resumable upload session via POST.
  5. Loop: PUT 8 MB chunks. On each successful response we call onProgress; the modal’s progress bar updates.
  6. Final chunk returns file metadata. We call permissions API to make anyone-with-link.
  7. Build the share URL with the file id, the selected key path, the note (if any), and the original filename, all embedded in the URL hash.
  8. Copy to clipboard. Done.

The whole flow for a 500 MB JSON, on a typical home connection, takes about 25-40 seconds. Same JSON without gzip would take 8-15 minutes. That’s the difference between a feature people use and a feature people open once and never touch again.