Hi all, resumable file uploads have been at the top of the list in terms of features to add to the RS spec, specifically because it will enable us to deal with large files where a normal single HTTP POST would be impractical.
There’s been some discussion about it here:
General Approach
I’ve been toying with a proof of concept for this, and while it’s not yet complete, the general idea would be as follows:
- Select a file in browser
- Generate a checksum of the entire binary blob
- Chop it up into reasonably sized binary chunks
- Generate a checksum of the payload?
- Made a series of POSTs to the RS server using custom headers to indicate byte ranges/total, checksum of payload and final file?
- On the RS side, put the pieces back together and verify the checksum
- Send back header response based on validation of the checksum(s)
I’ve got steps 1-4 completed in a simple demo app, along with half of step 5 (I’m generating headers based on the data, but not submitting to the RS server yet):
Here are the custom headers I’m using at the moment, if anyone has any suggestions for any existing headers we could use in place of any of these, feel free to chime in here so we can improve this as we go.
X-Content-ID: d41d8cd98f00b204e9800998ecf8427e
X-Content-Range-ID: f1ba78aaee2fce91983793c8b90a38a5
X-Content-Range: bytes 2621440-3145728/3472612
Content-Type: image/jpeg
So, in this case X-Content-ID is the checksum of the original data loaded into the browser, X-Content-Range-ID is the checksum of the payload in this POST, X-Content-Range indicates size (bytes) beginning - end byte range of that POSTs payload, followed by / total size of the data being sent. Finally Content-Type is the only “standard” header so far which indicates the file type.
- Question What should the response servers look like?
- Question What should the server do with each payload until it’s ready to re-constitute the file? Temp storage dir? User storage?
Optimization & Performance
In my experience in dealing with large binary blobs in the browser (several gigs, for example) there are some less than ideal side effects.
- Loading the entire file into the browser (memory usage skyrockets)
- Processing large binary files causes performance issues and when done in the main thread can cause complete unresponsiveness in the UI.
In order to account for these issues, we’d probably need to implement one or more web-workers to handle different parts of the process, as web-workers run in their own thread, we can try to keep as much work outside of the main thread so as not to have too heavy an effect on the app itself. In order to reduce memory consumption, we can discard old payloads once they’ve been uploaded to the server. This would mean likely the highest memory usage would be upfront, and then reduced during each POST.
- Question Memory leaks can be tricky. How best to handle cleanup? Can we discard the entire web-worker after each upload?
Your thoughts?
These are just my general thoughts on the topic without having dug into the more difficult aspects of the implementation details yet. if anyone has any ideas or suggestions feel free to comment here and we can hopefully get something done if not in the next spec draft, then the following one in the spring.
Cheers
-Nick