Large file upload server, support HTTP resumable upload of large files

MarkerHub
中文

_Source: blog.csdn.net/ababab12345/article/details/80490621
Website: https://idea.markerhub.com

Recently, due to the product needs of the research and development group of the author, it is necessary to support high-performance http upload of large files, and it is required to support http resumable upload. Here is a brief summary for easy memory:

  1. The server side is implemented by C language, rather than an interpreted language such as java and PHP;
  2. The server writes to the hard disk immediately, so there is no need to call move_uploaded_file , InputStreamReader which requires caching, to avoid server memory usage and browser request timeout;
  3. Support HTML5 and IFRAME (for old browsers), and support to obtain file upload progress.

In order to better adapt to the current mobile Internet, the upload service is required to support resumable upload and reconnect after disconnection. Because the mobile Internet is not very stable; in addition, the possibility of abnormally dropped when uploading a large file is very high. In order to avoid re-uploading, it is very necessary to support resumable upload.

The idea of supporting resumable upload is:

client (usually a browser) uploads a file to the server and keeps recording the progress of the upload. Once the connection is dropped or other abnormalities occur, the client can query the server for the status of a file that has been uploaded. The location of the uploaded file is then uploaded.

There are also masters on the Internet that use the fragmented file upload method to upload large files. The method is to cut the file into small pieces, for example, a 4MB fragment. Each time the server receives a small piece of file and saves it as a temporary file, wait for all the fragments to be transferred. Perform the merger. The author believes that if the original file is small enough, this method is okay, but once the file has hundreds of megabytes or several GB or dozens of GB, the time to merge the files will be very long, often leading to browser response timeouts or server block.

If you implement an independent client (or the browser's ActiveX plug-in) to upload files, it will be a very simple matter to support the resumable upload. You only need to record the file upload status on the client. Supporting browser resumable uploads (no need to install third-party plug-ins) is generally more difficult than doing independent client uploads by yourself, but it's not difficult. My realization idea is as follows:

1. When the browser uploads a certain file, it first generates a HASH value for the file, which must be generated on the browser side.

The file upload record cannot be queried by the file name one by one. The repetitiveness of the file name is very large, and the repetitiveness of the value composed of the file name + file size is reduced. If the file modification time is added, the repetitiveness is further reduced. The ID of the previous browser can further reduce repetitive conflicts. The best HASH value calculation method is to use the content of the file for MD5 calculation, but the amount of calculation is very large (in fact, there is no need to do this), and excessive time-consuming will affect the upload experience.

Based on the above reasons, my HASH value calculation ideas are as follows:

  1. First give the browser an ID, which is stored in a cookie;
  2. The result of browser ID + file modification time + file name + file size is MD5 to calculate the HASH value of a file;
  3. The browser ID is automatically granted to the browser when the browser accesses the file upload site.
//简单的Cookie帮助函数  
function setCookie(cname,cvalue,exdays)  
{  
  var d = new Date();  
  d.setTime(d.getTime()+(exdays*24*60*60*1000));  
  var expires = "expires="+d.toGMTString();  
  document.cookie = cname + "=" + cvalue + "; " + expires;  
}  
   
   
function getCookie(cname)  
{  
  var name = cname + "=";  
  var ca = document.cookie.split(';');  
  for(var i=0; i<ca.length; i++)   
  {  
    var c = ca[i].trim();  
    if (c.indexOf(name)==0) return c.substring(name.length,c.length);  
  }  
  return "";  
}  
//  
//简单的文件HASH值计算,如果您不是十分考究,应该可以用于产品。  
//由于计算文件HASH值用到了多种数据,因此在HYFileUploader系统范围内发生HASH冲突的可能性应该非常小,应该可以放心使用。  
//获取文件的ID可以用任何算法来实现,只要保证做到同一文件的ID是相同的即可,获取的ID长度不要超过32字节  
//  
function getFileId (file)   
{  
    //给浏览器授予一个唯一的ID用于区分不同的浏览器实例(不同机器或者相同机器不同厂家的浏览器)  
    var clientid = getCookie("HUAYIUPLOAD");  
    if (clientid == "") {  
        //用一个随机值来做浏览器的ID,将作为文件HASH值的一部分  
        var rand = parseInt(Math.random() * 1000);  
        var t = (new Date()).getTime();  
        clientid =rand+'T'+t;  
          
        setCookie("HUAYIUPLOAD",clientid,365);  
    }  
      
    var info = clientid;  
    if (file.lastModified)  
        info += file.lastModified;  
    if (file.name)  
        info += file.name;  
    if (file.size)  
        info += file.size;  
    //https://cdn.bootcss.com/blueimp-md5/2.10.0/js/md5.min.js  
    var fileid = md5(info);  
    return fileid;  
}  

The author thinks: It is not necessary to calculate the HASH value by reading the content of the file, it will be very slow. If you really need to implement HTTP second transmission, you may have to do so, so that if the content of the file uploaded by different people is the same, you can avoid repeated uploads and return the result directly.

The reason for assigning an ID to the browser can further avoid the HASH value conflicts of files with the same name and the same size on other computers.

2. Query the HASH value of the file

In file upload support, first query the upload progress information of the file from the upload server through the file's HASH value, and then start uploading from the upload progress position, the code is as follows:

var fileObj = currentfile;  
var fileid = getFileId(fileObj);  
var t = (new Date()).getTime();  
//通过以下URL获取文件的断点续传信息,必须的参数为fileid,后面追加t参数是避免浏览器缓存  
var url = resume_info_url + '?fileid='+fileid + '&t='+t;  
  
var ajax = new XMLHttpRequest();  
  
ajax.onreadystatechange = function () {   
    if(this.readyState == 4){  
        if (this.status == 200){  
            var response = this.responseText;  
              
            var result = JSON.parse(response);  
            if (!result) {  
                alert('服务器返回的数据不正确,可能是不兼容的服务器');  
                return;  
            }  
            //断点续传信息返回的文件对象包含已经上传的尺寸  
            var uploadedBytes = result.file && result.file.size;  
            if (!result.file.finished && uploadedBytes < fileObj.size) {  
                upload_file(fileObj,uploadedBytes,fileid);  
            }  
            else {  
                //文件已经上传完成了,就不要再上传了,直接返回结果就可以了  
                showUploadedFile(result.file);  
                //模拟进度完成  
                //var progressBar = document.getElementById('progressbar');  
                //progressBar.value = 100;  
            }  
              
        }else {  
            alert('获取文件断点续传信息失败');  
        }    
    }   
}  
  
ajax.open('get',url,true);  
ajax.send(null);  

The above is achieved through the jQuery-file-upload component. For the implementation code through the original Javascript, please refer to the h4resume.html sample code in the demos directory.

Three, perform upload

After querying the resumable upload information of the file, if the file has indeed been uploaded before, the server will return the file size that has been uploaded, and then we can upload the data from the size of the file that has been uploaded.

The slice of the html5 File object can be used to cut and upload fragments from the file.

definition and usage

The slice() method can extract a part of a word file and return the extracted part with a new string.

Syntax

File.slice(start,end)

parameter description

start The starting index of the segment to be extracted. If it is a negative number, this parameter specifies the position counted from the end of the string. In other words, -1 refers to the last character of the string, -2 refers to the penultimate character, and so on.

end The subscript immediately following the end of the segment to be extracted. If this parameter is not specified, the substring to be extracted includes the string from start to the end of the original string. In addition, pay attention to the Java PhoenixMiles official account, reply to the "back-end interview", and send you a collection of interview questions!

If the parameter is a negative number, then it specifies the position counted from the end of the string.

code implements segment file uploaded follows:

/*  
文件上传处理代码  
fileObj : html5 File 对象  
start_offset: 上传的数据相对于文件头的起始位置  
fileid: 文件的ID,这个是上面的getFileId 函数获取的,  
*/  
function upload_file(fileObj,start_offset,fileid)  
{  
 var xhr = new XMLHttpRequest();  
 var formData = new FormData();  
   
 var blobfile;  
   
 if(start_offset >= fileObj.size){  
  return false;  
 }  
   
 var bitrateDiv = document.getElementById("bitrate");  
 var finishDiv = document.getElementById("finish");  
 var progressBar = document.getElementById('progressbar');  
 var progressDiv = document.getElementById('percent-label');  
   
 var oldTimestamp = 0;  
 var oldLoadsize = 0;  
 var totalFilesize = fileObj.size;  
 if (totalFilesize == 0) return;  
   
 var uploadProgress = function (evt) {  
  if (evt.lengthComputable) {  
   var uploadedSize = evt.loaded + start_offset;   
   var percentComplete = Math.round(uploadedSize * 100 / totalFilesize);  
   
   var timestamp = (new Date()).valueOf();  
   var isFinish = evt.loaded == evt.total;  
   
   if (timestamp > oldTimestamp || isFinish) {  
    var duration = timestamp - oldTimestamp;  
    if (duration > 500 || isFinish) {  
     var size = evt.loaded - oldLoadsize;  
   
     var bitrate = (size * 8 / duration /1024) * 1000; //kbps  
     if (bitrate > 1000)  
      bitrate = Math.round(bitrate / 1000) + 'Mbps';  
     else  
      bitrate = Math.round(bitrate) + 'Kbps';  
   
     var finish = evt.loaded + start_offset;  
   
     if (finish > 1048576)  
      finish = (Math.round(finish / (1048576/100)) / 100).toString() + 'MB';  
     else  
      finish = (Math.round(finish / (1024/100) ) / 100).toString() + 'KB';  
   
     progressBar.value = percentComplete;  
     progressDiv.innerHTML = percentComplete.toString() + '%';  
     bitrateDiv.innerHTML = bitrate;  
     finishDiv.innerHTML = finish;  
   
     oldTimestamp = timestamp;  
     oldLoadsize = evt.loaded;  
    }  
   }  
  }  
  else {  
   progressDiv.innerHTML = 'N/A';  
  }  
 }  
   
 xhr.onreadystatechange = function(){  
    if ( xhr.readyState == 4 && xhr.status == 200 ) {  
      console.log( xhr.responseText );  
        
    }  
  else if (xhr.status == 400) {  
     
  }  
  };  
   
 var uploadComplete = function (evt) {  
  progressDiv.innerHTML = '100%';  
   
  var result = JSON.parse(evt.target.responseText);  
  if (result.result == 'success') {  
   showUploadedFile(result.files[0]);  
  }  
  else {  
   alert(result.msg);  
  }  
 }  
   
 var uploadFailed = function (evt) {  
  alert("上传文件失败!");  
 }  
   
 var uploadCanceled = function (evt) {  
  alert("上传被取消或者浏览器断开了连接!");  
 }  
   
 //设置超时时间,由于是上传大文件,因此千万不要设置超时  
 //xhr.timeout = 20000;  
 //xhr.ontimeout = function(event){  
  //  alert('文件上传时间太长,服务器在规定的时间内没有响应!');  
  //}           
   
 xhr.overrideMimeType("application/octet-stream");   
   
 var filesize = fileObj.size;  
 var blob = fileObj.slice(start_offset,filesize);  
 var fileOfBlob = new File([blob], fileObj.name);  
 //附加的文件数据应该放在请求的前面  
 formData.append('filename', fileObj.name);  
 //必须将fileid信息传送给服务器,服务器只有在获得了fileid信息后才对文件做断点续传处理  
 formData.append('fileid', fileid);  
 //请将文件数据放在最后的域  
 //formData.append("file",blob, fileObj.name);  
 formData.append('file', fileOfBlob);  
   
 xhr.upload.addEventListener("progress", uploadProgress, false);  
   
 xhr.addEventListener("load", uploadComplete, false);  
 xhr.addEventListener("error", uploadFailed, false);  
 xhr.addEventListener("abort", uploadCanceled, false);  
 xhr.open('POST', upload_file_url);  
 //  
 xhr.send(formData);  
}  

In order to verify the resuming of file upload, the author made a simple interface to display the status information in the process of file upload. The interface is as follows:

picture

Through HTML, you can calculate the progress of the file upload, the size of the file that has been uploaded, the bit rate of the file upload and other information. If there is any abnormality during the upload process, you can upload it again. The uploaded part will not need to be uploaded again.

In order to verify HTML5 resumable upload, you can download this file upload server through github for testing.

https://github.com/wenshui2008/UploadServer
阅读 782

Java技术干货
每天一点Java小知识,让Java不再难懂!
433 声望
211 粉丝
0 条评论
你知道吗?

433 声望
211 粉丝
文章目录
宣传栏