Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing emoji substitutions #700

Open
forresto opened this issue Apr 15, 2024 · 9 comments
Open

missing emoji substitutions #700

forresto opened this issue Apr 15, 2024 · 9 comments

Comments

@forresto
Copy link
Contributor

Expected Behavior

There are a handful of emoji substitutions that are not found, even after #688 landed.

❤️‍🩹 should render as one glyph. (1433)

Current Behavior

❤️‍🩹 is rendering as 3 glyphs, ([ 169, 18, 1345 ])

Possible Solution

I can make a PR with failing test cases, if that's helpful.

Steps to Reproduce (for bugs)

#️⃣ found sub [ 4, 22 ] 1520
*️⃣ found sub [ 5, 22 ] 1521
0️⃣ found sub [ 6, 22 ] 1531
1️⃣ found sub [ 7, 22 ] 1522
⛹️‍♀️ found sub [ 140, 18, 81 ] 140
⛹️‍♂️ found sub [ 140, 18, 82 ] 140
❤️‍🔥 found sub [ 169, 18, 794 ] 1432
❤️‍🩹 found sub [ 169, 18, 1345 ] 1433

I'm manually looking for substitutions to find these, like this...

  const substitutions = font.substitution.getFeature("ccmp");

  let opentypeOptions = {
    kerning: true,
    language: "dflt",
    features: [{ script: "DFLT", tags: ["ccmp", "liga"] }],
  };

  for (const emoji of emojiData) {
    const { unicode } = emoji;
    const glyphs = font.stringToGlyphs(unicode, opentypeOptions);
    let glyph;
    if (glyphs.length === 1) {
      glyph = glyphs[0];
    } else {
      const indexes = glyphs.map((glyph) => glyph.index);
      const sub = substitutions.find((substitution) => equals(substitution.sub, indexes));

      if (sub) {
        glyph = font.glyphs.get(sub.by);
        console.log(unicode, "found sub", indexes, sub.by);
      } else {
        console.log(unicode, "no ccmp sub", indexes);
      }
    }
  }

/** Custom equals function that can also check lists. */
function equals(a, b) {
  if (a === b) {
    return true;
  } else if (Array.isArray(a) && Array.isArray(b)) {
    if (a.length !== b.length) {
      return false;
    }
    for (let i = 0; i < a.length; i += 1) {
      if (!equals(a[i], b[i])) {
        return false;
      }
    }
    return true;
  } else {
    return false;
  }
}

Context

Using noto-emoji in our CAD app, https://cuttle.xyz

Your Environment

  • Version used: be0d441
  • Font used: noto-emoji v47
  • Browser Name and version: Node
  • Operating System and version (desktop or mobile):
  • Link to your project:
@Connum
Copy link
Contributor

Connum commented Apr 15, 2024

@TonyJR would you be available to have a look at this, as you implemented the ccmp feature?

@TonyJR
Copy link
Contributor

TonyJR commented Apr 15, 2024

@TonyJR would you be available to have a look at this, as you implemented the ccmp feature?

Yes, I'm trying this.
I found the rule for "#️⃣ found sub [ 4, 22 ] 1520"

sub numbersign uni20E3 by keycap_hash;

It's should be GSUB4.1. I will find the reason.

@forresto
Copy link
Contributor Author

forresto commented Apr 15, 2024

Here are the ones that should result in one glyph, but return multiple.

[
{"string":"#️⃣","indexes":[4,23],"expected":1548},
{"string":"*️⃣","indexes":[5,23],"expected":1549},
{"string":"0️⃣","indexes":[6,23],"expected":1559},
{"string":"1️⃣","indexes":[7,23],"expected":1550},
{"string":"2️⃣","indexes":[8,23],"expected":1551},
{"string":"3️⃣","indexes":[9,23],"expected":1552},
{"string":"4️⃣","indexes":[10,23],"expected":1553},
{"string":"5️⃣","indexes":[11,23],"expected":1554},
{"string":"6️⃣","indexes":[12,23],"expected":1555},
{"string":"7️⃣","indexes":[13,23],"expected":1556},
{"string":"8️⃣","indexes":[14,23],"expected":1557},
{"string":"9️⃣","indexes":[15,23],"expected":1558},
{"string":"🏋️‍♀️","indexes":[447,18,82],"expected":447},
{"string":"🏋️‍♂️","indexes":[447,18,83],"expected":447},
{"string":"🏌️‍♀️","indexes":[448,18,82],"expected":448},
{"string":"🏌️‍♂️","indexes":[448,18,83],"expected":448},
{"string":"🏳️‍🌈","indexes":[485,18,256],"expected":1871},
{"string":"🏳️‍⚧️","indexes":[485,18,116],"expected":1872},
{"string":"👁️‍🗨️","indexes":[566,18,886],"expected":1432},
{"string":"👨‍❤️‍👨","indexes":[605,18,170,18,605],"expected":646},
{"string":"👨‍❤️‍💋‍👨","indexes":[605,18,170,18,640,18,605],"expected":644},
{"string":"👩‍❤️‍👨","indexes":[606,18,170,18,605],"expected":646},
{"string":"👩‍❤️‍👩","indexes":[606,18,170,18,606],"expected":646},
{"string":"👩‍❤️‍💋‍👨","indexes":[606,18,170,18,640,18,605],"expected":644},
{"string":"👩‍❤️‍💋‍👩","indexes":[606,18,170,18,640,18,606],"expected":644},
{"string":"🕵️‍♀️","indexes":[855,18,82],"expected":855},
{"string":"🕵️‍♂️","indexes":[855,18,83],"expected":855},
{"string":"⛹️‍♀️","indexes":[141,18,82],"expected":141},
{"string":"⛹️‍♂️","indexes":[141,18,83],"expected":141},
{"string":"❤️‍🔥","indexes":[170,18,795],"expected":1433},
{"string":"❤️‍🩹","indexes":[170,18,1346],"expected":1434},
]

@TonyJR
Copy link
Contributor

TonyJR commented Apr 15, 2024

I found the reason!
You have inputed a “fully-qualified” emoji and the font not supported.

image

@TonyJR
Copy link
Contributor

TonyJR commented Apr 15, 2024

WTF!Figma draw it right. I'm going to find the reason out.

@Connum
Copy link
Contributor

Connum commented Apr 24, 2024

@TonyJR any progress on this?

@TonyJR
Copy link
Contributor

TonyJR commented Apr 25, 2024

@TonyJR any progress on this?

Sorry, I've been a bit busy lately. \uFE00-\uFE0F are variation selectors, which should deal in cmap.
I have tested halfbuzz and it skips these characters.
I have two solutions to solve the bug.

  1. Process cmap before processing gsub. After this, remove them away.
  2. Skip them when processing gsub.

I prefer the first option. @Connum, are you familiar with CMAP.

@Connum
Copy link
Contributor

Connum commented Apr 25, 2024

I implemented a special handling of variation selectors some time ago, maybe that's interfering? And the order of processing should be stated in the docs. As far as I remember, cmap should be handled before any layout is applied.

@TonyJR
Copy link
Contributor

TonyJR commented Apr 25, 2024

Yes, you are right. I'm trying to find the order. But I prefer to directly refer to the Halfbuzz source code.
And I found that half actually merges and processes the functions in gsub/gpos. Perhaps we should also refer to it, but this may be a big project...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants