theme | _class | paginate | backgroundColor | backgroundImage | style | marp |
---|---|---|---|---|---|---|
gaia |
lead |
true |
section.photo h1,section.photo h2,section.photo h3,section.photo h4,section.photo h5,section.photo h6 {
background-color: #888;
color: #FFF;
}
h6 {
font-size: 30%;
}
img[alt~="centre"] {
display: block;
margin: 0 auto;
}
|
true |
- A String is just a String, right?
- A Brief History of the String
- Not all Strings are alike
- String
- Byte String
- OS String
- C Strings
let s: String = "Hi 😀!".to_owned();
dbg!(&s);
dbg!(s.len());
dbg!(s.bytes().count());
dbg!(s.chars().count());
- A Vector of
u8
inside - Iterates as 32-bit
char
let s: [u8; 13] = b"Hello, world!".to_owned();
dbg!(&s);
dbg!(s.len());
- Iterates as octets (
u8
) - A Vector of octets (
u8
) inside
- Computers work in numbers
- Humans like to write words
- Words are made of characters
- Technically grapheme clusters
- Is ï one character or two?
- We need a conversion table!
- AKA: A Character Set
- Morse Code
- Telegraph / Baudot codes
- BCD
- EBCDIC
- ASA X3.4-1963
- aka ASCII
- We get 128 more characters!
- MS-DOS Code Page 437, 850, ...
- Windows Code Page 1252, 1250, ...
- Macintosh Code Page 1275, 1282, ...
Unicode is intended to address the need for a workable, reliable world text encoding. Unicode could be roughly described as "wide-body ASCII" that has been stretched to 16 bits to encompass the characters of all the world's living languages. In a properly engineered design, 16 bits per character are more than sufficient for this purpose.
- Microsoft used it in Windows
- Sun used it in Java
- Netscape used it in JavaScript
- The Standard C Library added
wcslen
and friends
- Unicode Translation Format 16 (UTF-16) arrives
- Unit length != number of characters
- Not ASCII compatible
- Enter Plan 9 and UTF-8...
- Variable-length encoding
- Can encode any Unicode Scalar Value as one, two, three or four bytes.
- Unit length != number of characters
0b0xxxxxxx
0b110xxxxx 0b10xxxxxx
0b1110xxxx 0b10xxxxxx 0b10xxxxxx
- POSIX says file names are an array of 8-bit values
- Windows says file names are an array of 16-bit
wchar_t
- :(
String
/&[str]
/"hi"
- use this by default
Vec<u8>
/&[u8]
/b"hi"
- use for exchanging data with 8-bit / ASCII systems
OsString
/OsStr
- use for exchanging data with your Operating System
CString
/CStr
- use for exchanging data with 8-bit C APIs
- null-terminated
- Might not be UTF-8
- https://docs.rs/widestring/
- use for exchanging data with 'wide' C APIs