Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autogenerate event pages #592

Merged

Conversation

tovmasharrison
Copy link
Contributor

@tovmasharrison tovmasharrison commented Jan 2, 2024

Description

The events can be autogenerated by adding WMT events to wmt_events.json and the rest to events.json. I have tried to automate the most commonly added information about the events. However, specific information that is not automated can be added manually to the markdown file after generating the event.

The _events.json and _wmt_events.json files include the empty structure of available fields that can be automated by choice. I have included them for reference.

Also, I have created a list to show all the MT Summit, EAMT, and AMTA events separately at the top of the page. This closes that part in #584.

How it works:

  • Validation: After the events are added to the corresponding .json file, generate.py validates the entries and creates a file inside the events/ directory. For external events or for events where there's no need to create a page, the "id" key-value pair should not be added. The examples are AMTA 2024, AmericasNLP and WMT24.

  • events.md: An appropriate header is created based on the event's startDate, and each event is listed under the corresponding header appropriately. For example, if the event starts in 2024 or 2025, it will be listed under 2024 Events or 2025 Events.

  • calls-for-papers.md: If the event has a call for paper, then "callsForPapersDeadline": "some_date" key value pair should be given inside the .json file for it to be added to calls-for-papers.md or else it won't.

  • Bold/unbold dates and names based on the deadline: The names and dates are in bold if the deadline has not been passed, and as soon as it passes they will automatically unbold. The check to bold/unbold will happen each time the website is generated, as it will compare the endDate with the current date.

  • Past/future tense: When adding the event to the .json file, the future and past tenses can be added simultaneously. For the future tense, it can be added as "openingParagraph": "The event will take place", for the past tense, it can be added as "pastOpeningParagraph": "The event took place", and the appropriate tense will be automatically shown with the same bold/unbold logic mentioned above.

  • Syntax: The markdown syntax should be replaced with HTML tags inside the .json file.
    <strong>Some word</strong> should be used to bold the name.
    <a href='/link'>some_word</a> should be used to add a link.

Please let me know if you have any questions or suggestions.

Fixes #563

Checklist:

**event
}

# Doesn't create a file if it's an external event, as external events should not have an id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's an example of an "external" event?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, AmericasNLP. We don't have a page for it but we link the event externally.

Copy link
Collaborator

@bittlingmayer bittlingmayer Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, let's say

# Doesn't create a file if it's an event we don't have a page for, which should not have an ID.

generate.py Outdated
name = event['name']

# Start and end dates are necessary for listing the events on events.md
if not isinstance(event['startDate'], str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really we should be checking that it's parsed as a date.

generate.py Outdated
if not isinstance(event['startDate'], str):
raise Exception(event['startDate'])

if not isinstance(event['endDate'], str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if endDate is missing, we assume it's just a single day event?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can

"location":"",
"startDate":"",
"endDate":"",
"date":"",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this in addition to startDate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a separate date for cases where we specify the date in words.
Screenshot from 2024-01-05 11-16-51

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tovmasharrison !

Is there a way to have events whose date is unknown (and thus have no info in the date column) manually placed in a certain row?

Right now, I have the WMT24 example, which appears in the middle of the table.

WMT events are usually at the end of the year, so I placed it as the first event because they are organized in reverse chronological order.

Screenshot 2024-01-24 at 09 06 19

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

I'll take a look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cefoo,

Since we are auto-generating the events, there wasn't a straightforward way to place the WMT24 event manually at the top. However, I provided an approximate date for it based on the rest of the WMT events so it can be at the top until the actual date becomes known.

Please let me know if that solution is ok.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liashahnazaryan has just created the WMT24 page, with date on December.

However, it would be good to enforce WMT to be on top in future years, so this could be a good practice moving forward. WMT is usually on December, I guess we could just put December 31st until we know the real date. However, that date should be hidden until we know the real one.

"linkUrl":[
""
],
"impDatesHeader":"",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this?

If imp stands for "important", let's just write importantDatesHeader.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just want:

"importantDates": [ {
        "name": "...",
        "date": "..."
    },
    ...
]

"blockquoteSpeakersHeader":"",
"speakHeadcontent":[
{
"bshOpeningSentence":"",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

""
],
"blockquoteSpeakersHeader":"",
"speakHeadcontent":[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content with uppercase

"location":"Chicago, Illinois"
},
{
"name":"<a href='https://turing.iimas.unam.mx/americasnlp/2024_workshop.html'>AmericasNLP</a>",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there should be a URL in the name.

{
"name":"",
"id":"",
"title":"",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both name and title?

"date":""
}
],
"bulletKeyNoteSpeakOrTopsHeader":"",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we should have this.

Instead of ""bulletKeyNoteSpeakOrTopsHeader" and "names", we need just

"organisers": [
   {
       "name": "...",
       "institution": "..."
   },
   ...
]

"url":""
}
],
"sharedTasksHeader":"",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this only for WMT?

"optionalSentence":""
}
],
"seo":{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be in JSON.

It should get generated in markdown.

@bittlingmayer
Copy link
Collaborator

The JSON schema needs to be way more minimal.

We want to reduce the burden on people adding events.

It's fine if we harmonise a bit how events are displayed.

@bittlingmayer
Copy link
Collaborator

Having a JSON schema is a good idea.

We should just use something like https://json-schema.org/, and then we can quickly run a validator.

Instead of using ad-hoc schema format, and then still needing dozens of if-statements.

@tovmasharrison
Copy link
Contributor Author

tovmasharrison commented Feb 8, 2024

Hey @bittlingmayer,

I made quite a bit of changes.

  • Changed the structure of the JSON files for adding events

  • Created a separate calls_for_papers.json file for adding external events that have a Calls For Papers

  • Created a JSON Schema to validate events.json, callsForPapers.json and wmt_events.json

  • When manually adding events, I changed it so it can be added in markdown instead of HTML tags. Also, I changed the loops for generating events to Markdown + Liquid.

  • The date range is being automatically generated.

  • SEO is generated from generate.py.

  • Removed Organizers columns from calls_for_papers.md.

  • Provided examples of what each property is inside _events.json, _wmt_events.json, _calls_for_papers.json.

Also, I have created a WMT Test event so we can see what it will look like. If everything is fine, I'll remove it.

Please let me know what you think.

"type":"Person",
"url":"http://turing.iimas.unam.mx/americasnlp/st.html"
},
"futureTenseOpeningParagraph": "The <strong>Fourth AmericasNLP Competition</strong> will take place online in June, at NAACL 2024 in Mexico City. The competition focused on creating machine translation systems for indigenous languages from the Americas.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd avoid this.

Also should we use Markdown instead of HTML?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just remove future/past tenses and only keep opening_paragraph? But this way, it should be modified manually when the event happens.

Well, each event page is rendered through HTML. I don't think there is a straightforward way to use Markdown syntax inside HTML?

"speakers": [
{
"type": "Keynote speakers",
"about": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels wrong

i'd expect something more like this:

"speakers": [
       { name: "Graham Neubig", type: "keynote" },
       { name: "Jaime Pérez González", type: "keynote" },
       "Manuel Mager",
       "Abteen Ebrahimi",
       "Shruti Rijhwani",
       "Arturo Oncevay",
       "Luis Chiruzzo",
       "Robert Pugh",
       "Katharina Kann"
]

though not sure if the schema will allow either string or object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema won't allow this since an array should contain elements of the same type.

Probably something like this can do:

    { "name": "Graham Neubig", "type": "keynote" },
    { "name": "Jaime Pérez González", "type": "keynote" },
    { "name": "Manuel Mager" },
    { "name": "Abteen Ebrahimi" },
    { "name": "Shruti Rijhwani" },
    { "name": "Arturo Oncevay" },
    { "name": "Luis Chiruzzo" },
    { "name": "Robert Pugh" },
    { "name": "Katharina Kann" }
]```

@bittlingmayer, What do you think?

_data/events.json Outdated Show resolved Hide resolved
_data/events.json Outdated Show resolved Hide resolved
_data/wmt_events.json Outdated Show resolved Hide resolved
languages/emj.md Outdated
name: Yandex Translate
supported_qe_apis: []
seo:
name: Machine translation for None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll fix that with a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #598.

@bittlingmayer
Copy link
Collaborator

Yes, with schema is better!

Made a few comments, but today is busy with the community meetup, will look more on the weekend.

generate.py Outdated Show resolved Hide resolved
events/wmt-test.md Outdated Show resolved Hide resolved
@tovmasharrison
Copy link
Contributor Author

tovmasharrison commented Feb 23, 2024

@bittlingmayer

I have added a form submission at http://127.0.0.1:4000/add_event that returns a JSON version of the inputs, which can be copied and pasted inside the events.json / wmt_events.json / calls_for_papers.json files.

I have reverted the structure for speakers, important_dates, multiday_schedule, and one_day_schedule to their previous states since their input boxes were being represented as textarea. See below.
Screen Shot 2024-02-22 at 13 53 20

Also, the WMT Test event is still included for reference. I'll remove it once everything looks good.

Let me know what you think.

type: Organization
name: European Association of Machine Translation
url: https://eamt.org
start_date: '2024-06-24'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can print these without quotes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if it is easy - I know it is probably just the default in the conversion to YAML.

events/add_event.md Outdated Show resolved Hide resolved
_includes/form.html Outdated Show resolved Hide resolved
@bittlingmayer bittlingmayer merged commit e48ab07 into machinetranslate:master Mar 3, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Autogenerate Event pages
3 participants