 e59f39d403
			
		
	
	
		e59f39d403
		
	
	
	
	
		
			
			We reject bytes that can't occur in valid UTF-8 (\xC0..\xC1,
\xF5..\xFF in the lexer.  That's insufficient; there's plenty of
invalid UTF-8 not containing these bytes, as demonstrated by
check-qjson:
* Malformed sequences
  - Unexpected continuation bytes
  - Missing continuation bytes after start bytes other than
    \xC0..\xC1, \xF5..\xFD.
* Overlong sequences with start bytes other than \xC0..\xC1,
  \xF5..\xFD.
* Invalid code points
Fixing this in the lexer would be bothersome.  Fixing it in the parser
is straightforward, so do that.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-23-armbru@redhat.com>
		
	
			
		
			
				
	
	
		
			8 lines
		
	
	
		
			182 B
		
	
	
	
		
			C
		
	
	
	
	
	
			
		
		
	
	
			8 lines
		
	
	
		
			182 B
		
	
	
	
		
			C
		
	
	
	
	
	
| #ifndef QEMU_UNICODE_H
 | |
| #define QEMU_UNICODE_H
 | |
| 
 | |
| int mod_utf8_codepoint(const char *s, size_t n, char **end);
 | |
| ssize_t mod_utf8_encode(char buf[], size_t bufsz, int codepoint);
 | |
| 
 | |
| #endif
 |