Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove controls unicode #77

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion lib/urllib.js
Original file line number Diff line number Diff line change
Expand Up @@ -671,7 +671,7 @@ function parseJSON(data) {
data: null
};
try {
result.data = JSON.parse(data);
result.data = JSON.parse(data.replace(/[\u0000-\u001f\u0080-\u009f]/g, ""));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加个参数才替换吧,这样你的场景不可能让微信改的,也能实现。

function parseJSON(data, trimControlCharacters) {
  // ...
  if (trimControlCharacters) {
    data = data.replace(/[\u0000-\u001f\u0080-\u009f]/g, "");
  }
  // json parse data
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

完整的实现应该不是替换成空白符,而是替换成正确的 encode 字符。

var escMap = {
  '"': '\\"',       // \u0022
  '\\': '\\\\',     // \u005c
  '\b': '\\b',    // \u0008 
  '\f': '\\f',      // \u000c
  '\n': '\\n',    // \u000a
  '\r': '\\r',      // \u000d
  '\t': '\\t'       // \u0009
};
var escFunc = function (m) { 
  return escMap[m] || '\\u' + (m.charCodeAt(0) + 0x10000).toString(16).substr(1); 
};
var escRE = /[\u0000-\u001F\u005C]/g;
data = data.replace(escRE, escFunc);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我测试了一下,只有 \u0000-\u001F 会出问题, \u0080-\u009f 不会有问题的。

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

呃。。。JSON.parse('{"foo": "bar \u005c"}') 会有问题

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var escRE = /[\u0000-\u001F\u005C]/g;

@fengmk2

  • \u005C 不能替换,否则会将合法JSON String异化。Polyfill给出的是stringify的实现与parse是不同的。
  • escFunc会导致case digest auth success in httpbin过不了。
  • 数据测试后发现C0控制(0000-001F)符还是用空字符替换比较稳妥,虽然有部分失真。

} catch (err) {
if (err.name === 'SyntaxError') {
err.name = 'JSONResponseFormatError';
Expand Down
3 changes: 3 additions & 0 deletions test/fixtures/server.js
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,9 @@ var server = http.createServer(function (req, res) {
} else if (req.url === '/errorcharset') {
res.setHeader('Content-Type', 'text/plain;charset=notfound');
return res.end('你好');
} else if (req.url === '/json_with_controls_unicode') {
res.writeHeader(200);
return res.end(new Buffer('{"foo":"\u000e\u0086"}'));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种编码,明显是微信服务端没有正确处理好 json 特殊字符编码导致的。。。

}

var url = req.url.split('?');
Expand Down
12 changes: 12 additions & 0 deletions test/urllib.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -1161,4 +1161,16 @@ describe('urllib.test.js', function () {
});
});
});

describe('json string with controls unicode', function() {
it('should handle GET /json_with_controls_unicode with dataType=json', function (done) {
urllib.request(host + '/json_with_controls_unicode', {
dataType: 'json'
}, function (err, data, res) {
res.should.status(200);
data.should.eql({foo:""});
done();
});
});
});
});