Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check consistency WD <-> SWERIKS #27

Open
salgo60 opened this issue Nov 4, 2024 · 13 comments
Open

Check consistency WD <-> SWERIKS #27

salgo60 opened this issue Nov 4, 2024 · 13 comments

Comments

@salgo60
Copy link

salgo60 commented Nov 4, 2024

I guess WD has some issues with father son

I guess its easier if you check as I havent seen an API if we are in sync

Issues I think I found or is it just a mess with the formatter URL

  1. i-3CgnFW8AAQykoiHKz1eRJA is on
image
  1. The formatter URL wikidata has doesnt work P1630 --> Property:P12192#P1630
image
@BobBorges
Copy link
Contributor

Thanks!
The ID in question seems to be sort of "stray" -- there's only a non-primary name and an i-ort in our data set.

image

We have two guys with the surname and same iort -- Edward Magnus and Gustaf Robert. Assuming there's no third Seederholm i Ålberga gård, the ID shouldn't be associated with anything.

@salgo60
Copy link
Author

salgo60 commented Nov 4, 2024

I am lost

  1. formatter URL in WD for SWERIK property is not correct Property:P12192#P1630
    1. if you change it I think it takes some time to propagate its cached everywhere.... I did once a script touching all items related but dont know where it is,,,
  2. I deleted the ID from the son Edvard Sederholm (Q21046584) - logfile
image

@BobBorges
Copy link
Contributor

BobBorges commented Nov 4, 2024

It should be the second one and it works most of the time recently.

image

I deleted the ID from the son Edvard Sederholm (Q21046584) - logfile

This solves the issue on WD end? I will remove these stray IDs from SWERIK.

@salgo60
Copy link
Author

salgo60 commented Nov 4, 2024

This solves the issue on WD end? I will remove these stray IDs from SWERIK.

Greate I am on my way to iceskate but will try to do a link check also from the WD side

I close this issue

@salgo60 salgo60 closed this as completed Nov 4, 2024
@salgo60
Copy link
Author

salgo60 commented Nov 4, 2024

@BobBorges my vision when your project started was that you should have better landingpages that adds value if Wikipedia linked them e.g.

  • a PM then you had links to documents associated with the PM
  • party then you had data associated with the party
    • seats in the PM
    • ?!??

I tried to tell the sv:Wikipedia people that this is a big possibility when a research plattform has same as Wikidata... I felt no one was interested my feeling is that people on sv:Wikipedia like doing cut and paste and are not interested in datadriven solutions...

I hope I am wrong and also that your work could be easier to consume by having better landingpages MAYBE we could link SWERIKs from sv:Wikipedia...

The way I have populated WD and sv:Wikipedia is that PMs use a datadriven template Mall:Faktamall_biografi_WD i.e. we can just change the template and we will have links to SWERIKS for all PMs with that template

@salgo60
Copy link
Author

salgo60 commented Nov 5, 2024

I'm stepping back for now ;-)

Looks like wikidata blocked me ;-) It seems to be related to a user named LevandeMänniska who has been pushing to block me because some of the descriptions are inaccurate (see Ännu fler förvirrade beskrivningar) I'm not sure what's the best way forward, but I see the description field primarily as a way to disambiguate objects.... Out of the 1 million edits I've made, I suppose there are bound to be some errors. I tried reaching out to him by giving him my phone number, but no response....

Lesson learned: I now view Wikidata as a proof of concept for linked data, providing the opportunity to add sources. However, as it's an open platform, anything can happen. Many users on Wikidata lack professional backgrounds in data management or structured data solutions, resulting in more chaos than structure. Additionally, learning linked data feels like a trial-and-error process, and what was considered correct last year may no longer be the best approach today. Therefore, while an open platform can serve as a useful sandbox, it’s not something you can fully trust, as many self-nominated experts often pursue their own agendas—like user LevandeMänniska, who has been trying to block me for months, even though I’ve mostly stopped editing Wikidata, aside from a few BBQ locations here and there.

  • as you state about occupations Add profession metadata to the MP database #26 that you recommend people to use wikidata data instead of creating your own data I feel is not the way forward as how wikidata will develop you never now and also you can be blocked ;-)
    • the good thing is most people on WD dont mass update i.e. so it will take time until good things disappear....

if you have question you can call me ++46705152802 or salgo60@msn.com

over and out, and good luck navigating the Wikidata landscape! It took me 8 years before I appeared on the radar of user Levande Människa and ended up getting banned....

  • Nota Bene The engagement of contributors enhancing information about "Riksdagen" on Wikidata appears to have declined. Discussions within the Swedish Wikidata community have highlighted challenges in updating the platform with new documents from the Riksdag, indicating a potential gap in necessary skills. This situation raises concerns about the sustainability and accuracy of such data on Wikidata
    • It might indeed be a good time to take action and preserve as much quality data from Wikidata (QWD) as possible, especially if the future of certain datasets, is becoming uncertain...

@salgo60
Copy link
Author

salgo60 commented Nov 5, 2024

I did a small Notebook check of SWERIK using a Notebook called PAWS

Number of valid URLs: 6109
Number of invalid URLs: 68

Errors found
WD Q4934552 - https://swerik-project.github.io/person-catalog/i-PCZrYEHwPaEeNTZphEsWTv
WD Q4957371 - https://swerik-project.github.io/person-catalog/i-31gPpUoSm7zqzQckVmfPGy
WD Q4970175 - https://swerik-project.github.io/person-catalog/i-UX4D3JJdrTjFBf2zyfHx5t
WD Q4976825 - https://swerik-project.github.io/person-catalog/i-NvxzaU2RSok83zCskNAuhg
WD Q117223085 - https://swerik-project.github.io/person-catalog/i-EQM2NLR1fbN9izUQhjTRGR
WD Q97971262 - https://swerik-project.github.io/person-catalog/i-RH6VCPhyxs9yYcfXJzPxYT
WD Q97971276 - https://swerik-project.github.io/person-catalog/i-Cdgsqn4Ts9WMwbjXcE4537
WD Q98271639 - https://swerik-project.github.io/person-catalog/i-x1CuoKmRHYgQr9i2kh3B5
WD Q98538839 - https://swerik-project.github.io/person-catalog/i-TUyWWYGDFXW92GhiG3CLwF
WD Q98937434 - https://swerik-project.github.io/person-catalog/i-EzcxskgMAVbnq8hM2F2km9
WD Q98937482 - https://swerik-project.github.io/person-catalog/i-HYFwSCrwnemwyJTLMcyqvN
WD Q5802544 - https://swerik-project.github.io/person-catalog/i-EtThq89KCE79SrwT9ppHwa
WD Q6001491 - https://swerik-project.github.io/person-catalog/i-S3CBCc7cXNPRWXt4kT1Nn
WD Q5779581 - https://swerik-project.github.io/person-catalog/i-BibwVxLqqeX5rUkp4qZsoT
WD Q5779691 - https://swerik-project.github.io/person-catalog/i-AvyNgUr5vHb4YSPYHYNoDf
WD Q5891553 - https://swerik-project.github.io/person-catalog/i-M8wzDjdnp3v1kx7mCnhrnz
WD Q5930843 - https://swerik-project.github.io/person-catalog/i-PPwk8GX9Ac1MMgY78vBnxU
WD Q5931248 - https://swerik-project.github.io/person-catalog/i-8q84CfWpoFkjGhrjKmh5nV
WD Q5973676 - https://swerik-project.github.io/person-catalog/i-W5KTkCsx6UQycN1fck4krq
WD Q6015512 - https://swerik-project.github.io/person-catalog/i-657md5LkCsjE6B2F6cMUFR
WD Q6026925 - https://swerik-project.github.io/person-catalog/i-UMxTFnyXFG1sA9nuaawcTn
WD Q6043619 - https://swerik-project.github.io/person-catalog/i-EPD5BJ5xvWKMLybqidZ7xr
WD Q6045631 - https://swerik-project.github.io/person-catalog/i-XA4KxPbJJcoEq2kHZmGBg8
WD Q6054405 - https://swerik-project.github.io/person-catalog/i-KYcNGH8TDrzXp5RkXfVxcZ
WD Q6070153 - https://swerik-project.github.io/person-catalog/i-SoPKUW6bamDYhSJ8r5kbfm
WD Q6083505 - https://swerik-project.github.io/person-catalog/i-Kz6LFDnXFaN9pxampQmtys
WD Q6151281 - https://swerik-project.github.io/person-catalog/i-EQ1EaRBTJC4gvBVqD6F6QS
WD Q6186524 - https://swerik-project.github.io/person-catalog/i-UVmEkRLtur2TYHixo3YS36
WD Q6228284 - https://swerik-project.github.io/person-catalog/i-NEzajeS8oXAKC4PXqeHZNT
WD Q6244276 - https://swerik-project.github.io/person-catalog/i-7eaDwLCH46J5Z48Agp4bDd
WD Q6257688 - https://swerik-project.github.io/person-catalog/i-YSxWozeNBai9QXW24ThZk2
WD Q792307 - https://swerik-project.github.io/person-catalog/i-6FzAA1fd4V1GWFU8UEjDM9
WD Q97104614 - https://swerik-project.github.io/person-catalog/i-F9yiexrfiaMq7XRkN2UQtm
WD Q96758042 - https://swerik-project.github.io/person-catalog/i-Li7xEjG4CU6Q9Kayu1A6JD
WD Q97386321 - https://swerik-project.github.io/person-catalog/i-JDzNUwA9QaroyEei8swjky
WD Q97824066 - https://swerik-project.github.io/person-catalog/i-4o1RM4T3EmDZc7uLvsoLiC
WD Q3352340 - https://swerik-project.github.io/person-catalog/i-Y7HHuSEZsgc8ayVQEsVKs9
WD Q6015181 - https://swerik-project.github.io/person-catalog/i-F5Lo79KEGCBu1choqsfsAZ
WD Q60971016 - https://swerik-project.github.io/person-catalog/i-8RqA5Vq57Dp8X1YMfWXXz1
WD Q47067977 - https://swerik-project.github.io/person-catalog/i-BCDpWeGcyN6FUwwXHRDSyd
WD Q19976148 - https://swerik-project.github.io/person-catalog/i-F8n5AiCeSxhtfcXwu7PkYD
WD Q6196285 - https://swerik-project.github.io/person-catalog/i-GSjyw1eeZNrEr8Uk3Wy79K
WD Q117289330 - https://swerik-project.github.io/person-catalog/i-W4ytnPuPTvRtJf3k6ST5af
WD Q116162237 - https://swerik-project.github.io/person-catalog/i-EZYMWS6pSZNPSxi4996Lpc
WD Q116916 - https://swerik-project.github.io/person-catalog/i-soGG7WvpfsE45txj7YR3j
WD Q16650562 - https://swerik-project.github.io/person-catalog/i-6F9rS1XcW3FrTADBP2ew1K
WD Q18274740 - https://swerik-project.github.io/person-catalog/i-UFUisxxPnKCE3asVJtR1C6
WD Q26202 - https://swerik-project.github.io/person-catalog/i-Xvpu7KtsFhUijgkbtWpCVM
WD Q2694124 - https://swerik-project.github.io/person-catalog/i-GVHobKxNYcHjgVszvkcndc
WD Q38773508 - https://swerik-project.github.io/person-catalog/i-RE5rAQ194rSt7bN8ZGzmSk
WD Q4569362 - https://swerik-project.github.io/person-catalog/i-65tDQ1Kb8spvfcwmsyYib7
WD Q4992085 - https://swerik-project.github.io/person-catalog/i-9FLyBDaVeYA1bbxCdRsmNS
WD Q52924 - https://swerik-project.github.io/person-catalog/i-SQxvy2ue6orrTGivt4nDBE
WD Q52925 - https://swerik-project.github.io/person-catalog/i-5MnwqH2UtehSx7EbLGDQMA
WD Q52926 - https://swerik-project.github.io/person-catalog/i-3be6RBChcyBubPmFEyzLuZ
WD Q52927 - https://swerik-project.github.io/person-catalog/i-NBgv74Z6fFc4kgB87Q5s3i
WD Q5499466 - https://swerik-project.github.io/person-catalog/i-68qKbvhEHER4C2TRzVi2T9
WD Q5547623 - https://swerik-project.github.io/person-catalog/i-VoSy23Ve5KQBpG4mSX9qdp
WD Q5553946 - https://swerik-project.github.io/person-catalog/i-Q58Ze7TxTyB6TSL3tKfeoK
WD Q5585712 - https://swerik-project.github.io/person-catalog/i-JWaAdR37r5gFr2SYf1P6zG
WD Q5585717 - https://swerik-project.github.io/person-catalog/i-ELDAUPfRgFPiVy5G37SrBn
WD Q5605987 - https://swerik-project.github.io/person-catalog/i-K6cT1SiaPMfvDc7UQducRL
WD Q5615448 - https://swerik-project.github.io/person-catalog/i-JtSCjZhbn7kKtLncnKjNjs
WD Q5620967 - https://swerik-project.github.io/person-catalog/i-34QhxUMskSZttM6WAtP9fu
WD Q5715090 - https://swerik-project.github.io/person-catalog/i-693RFVRzxr1MXjzetwbKzY
WD Q5724152 - https://swerik-project.github.io/person-catalog/i-21sS3832F96xjNFhsY9x2i
WD Q5773319 - https://swerik-project.github.io/person-catalog/i-ADcRhddZxegj2BX4Abux5i
WD Q5779321 - https://swerik-project.github.io/person-catalog/i-6R7CFRqLrZfQGAGpRxRZmq

@BobBorges
Copy link
Contributor

Thanks Magnus!

This is very helpful, and I'm sorry you got blocked on WD! How can an open data platform block people who aren't blatantly abusive?

I'm going to reopen this issue so I remember to look into these cases on our end.

@BobBorges BobBorges reopened this Nov 5, 2024
@salgo60
Copy link
Author

salgo60 commented Nov 10, 2024

> I'm sorry you got blocked on WD!

No problem for me. Now you have to do the changes... 😅 doesn't feel that the user that was getting me blocked has an interest in helping your project... I checked him and he is doing editing on Wikidata getting paid by Swedish Public Employment Service and it feels he doesnt have a vision of how to create an echosystem its more singel edits and chasing people like me I tried to speak with him about things he had edit not following the way I used described by source that I think adds value and make it easier to get an overview (link) and after that an anonymous user started deleting things and argued, using the same logic that URLs are preferred over source descriptions, and began deleting property values. (could be its the same user editing anonymously to make changes appear as if multiple users agree with him The user was active only for a few hours but made edits with Preferred rank that shows this user has done some editing before 😺 plus confirms that he understands Swedish - is not the first time on Wikidata that happens see sockpuppetry) ;-)

Would have been better if he could together with you move forward

FYI: I restarted do some cleaning of wikidata when you pinged me 2 weeks ago I found in the report Wikidata:Database reports/Constraint violations/P12192 a list of about 100 WD profiles that needs more care....

How can an open data platform block people who aren't blatantly abusive?

how can an open plattform produce something useable ;-) its not a group of rocket scientist more older men with there own agenda... or teenagers with to much time... my take is that there is a good technical combination Wikipedia <-> Wikidata but the community is not skilled enough to produce more advanced Linked data.... the potential of the datamodell of Wikidata I feel is "the rub".

When you started Riksdagens Corpus it was some people doing good work in WIkidata I did a small check on the people I presented for your project dec 14 2022 doing things with Swedish PM and Riksdagens documents "Sveriges Riksdag 1867–2022: Ett ekosystem av länkad öppen data #84" and it looks 2024 that just one Ainali still working on making WD better for Swedish PM data, but now Ainali is handling it record by record rather than doing "mass imports," and he lacks programming skills... so I guess WD for the Swedish Parliament will degenerate... maybe they will adress it see note 20 oct 2024 I guess not - it was a lot of work user Popperipopp did in 2020 and you need some data skills to fix it...

image

portrattarkiv.se - SPA

A key success factor was also the work Omar did in scanning all the pages of the book Tvåkammar-riksdagen 1867–1970 and organizing it by 4355 individual persons. Without this work, it would not have been easy to link facts to the book. However, there might be room for improvement, as it turns out some individuals appear multiple times with contradictory data, and there is a certain "simplification" in how independent politicians are described.

image

Most of the pictures has also "same as wikidata" e.g.

image image image image image image image
  • Published in P1433 - wikidata Q116445396 = Porträttbok: Riksdagsmän 1894
image image image image image image

The major problem I see with Wikidata

The major problem I see with Wikidata is that we have the competence describe people but get problems to create good WD objects for something like "vilde" BUT having WD objects for church parishes, Swedish PMs since 1885, have it for all nearly all countries Wikidata:WikiProject every politician is magic and as I stated I think the research community could learn some good and some bad patterns from the Wikimedia people and perhaps with an academic community, it would be easier to reach consensus compared to navigating a group of older, often combative individuals who rarely meet in person 😊 and not everyone has a formal education....

I feel its sad that Wikidata and also your project couldnt handle errors in the book Tvåkammar-riksdagen 1867–1970 #157 as linked data... doing something as I did with contradicting sources like Riksarkivet / church books / ..... see #35

The major benefits I see with Wikidata

  • we always thinks support more languages
  • gghod or bad? we always run into LevandeMänniska problems 😄 that some user think e.g. the description field in Wikidata is worth fighting and killing for - odd thing why not just update it instead the fight is more important I think 😞 my feeling - LevandeMänniska has > 4 % reverts 🥇 8 times more than page created... tells about that users focus...
  • Wikidata offers versioning history and tools for managing abuse, but its open nature and the fact that contributors rarely meet in person make it challenging to accomplish more advanced tasks
  • with the Wikidata background they created WIkibase and then added Wikibase to support also picture in the SDC project video
image image
  • Structured data on Commons is multilingual information about a media file --> its a good match with Wikidata

I feel there are many lessons to take from this, one being that Wikidata isn’t a platform that research professionals can fully trust as stable. Another important point is the need for creating our own Persistent Identifiers for all referenceable objects, as highlighted in FAIR data principles, particularly F1. When I discussed this with Pelle Snickars from your project, he believed Wikidata Q numbers were sufficient.

However, as we’re seeing, Wikidata is far from stable. It's essential to have your own persistent identifiers backed by your own sources to confirm statements. In the long run, you'll also need to develop your platform to manage conflicting sources effectively, ensuring data reliability and accountability. I guess PROV is a step in the right direction, but it should be implemented alongside other data governance practices, quality control measures, and security protocols to build a complete framework for reliable and accountable data handling. Things I dont see today at Riksarkivet, RAÄ, Swedish "riksdagen"...

@salgo60
Copy link
Author

salgo60 commented Nov 11, 2024

More backup candidates

As new contributors with different perspectives and skill sets begin adding to Wikidata, the data is likely to evolve, leading to "scope creep." This means that perceptions of what constitutes good quality or a reliable source may shift—potentially for the better, but also possibly for the worse.

Managing an open platform with over 13 000 active contributors each month, supporting more than 200 languages, is undoubtedly challenging. In this context, ensuring regular "backups" of the data is a commendable and proactive measure.

image image

Managing an open platform like Wikidata, which has over 13,000 active contributors each month and supports more than 200 languages, presents significant challenges. In this context, regularly backing up the data is a prudent and forward-thinking practice.

From my six years of experience editing Wikidata, I’ve learned that the challenges you encounter are often unpredictable. The beauty of Wikidata is that you can focus on your specific areas of interest and rarely run into conflicts, especially when working with straightforward data like birth and death dates, political affiliations, or party memberships. When you have reliable sources, such as Tvåkammar-riksdagen 1867–1970, it becomes even easier. However, with 14 million monthly edits, opinions vary widely—some might advocate for the use of primary sources, while others deletes item and just rely on established resources like the Swedish National Archives (Riksarkivet) or SBL ( A project I doubt will ever be delivered — more like 'project drift' than scope creep see "Riksarkivet SBL Projekt på drift - för 62 år sedan var alla överens om att längre än 30 år till fick det inte ta").

Since Wikipedia articles are one of the primary consumers of Wikidata, this also influences changes to the Wikidata. Some Wikipedia contributors, for instance, do not advocate for the use of church records, as they may classify this as "original research." --> they vote for delete those sources....

My advice: avoid asserting that Wikidata can definitively be used for specific purposes like professions / identifying parties. Instead, back up the data you find useful, treating "Wikidata as the source," despite its shortcomings, volatility, and frequent lack of reliable references to support the information, requires careful consideration.

Authority control properties having a WD prop SWERIK P12192 and is a human

image

External properties having a WD prop SWERIK P12192 and is a human

image image image

External properties having a WD prop SWERIK P12192 and is a human and is Swedish

image image image

WD properties having a WD prop SWERIK P12192 and is a human

image image image image image image

SWERIKS and has described by Source P1343

image image image

SWERIKS and has "archives at" P485

image

SWERIKS and has "significant event" P793

image image

@salgo60
Copy link
Author

salgo60 commented Jan 14, 2025

Looks like Riksdagen has added som Swedish PMs also old ones - issue #184

@MansMeg
Copy link
Contributor

MansMeg commented Jan 14, 2025

Yes. We should do a sync against both wikidata and the open data soon. Then check missmatches.

@salgo60
Copy link
Author

salgo60 commented Jan 14, 2025

@MansMeg guess WIkidata is your friend

  1. I added some new people with P8388 Riksdagens person-GUID
    1-1) e.g. using Open refine to upload
image image
  1. I found maybe 100 people they have added born about 1915 --> we have the person in Wikidata and we have SWERIKS on them but as Riksdagen has no external identifiers I manually matched and added P8388 Riksdagens person-GUID
    --> easies for you just check if objects with SWERIK has P8388 Riksdagens person-GUID
image

The sad thing is how many SWEDISH PMs are missing at Riksdagens Öppna data

  • SPARQL has SWERIKS but miss Property:P8388 = 3752 records
image

I did a bar chart of birth year for people in Riksdagens Öppna data in the Notebook

image
  • same but for people in Wikidata and have SWERIK and used bins decades
image
  • using every year
image
  1. Also did some Open Refine on the data I found with the Notebook
image image image

Hittade i veckan massa textsträngar om dom Riksdagsgubbar Riksdagen har...

se video finns massa textsträngar om vad många gubbar gjort som har Property:P8388 men är bara ostrukturerad text det verkar som född och namn är det som är strukturerat...

  • tråkigt att Riksdagen inte jobbar mer med sin metadata.... och berättar vad dom gör...
    • att dom inte använder SWEPUB eller Wikidata för bättre data är nog en bra indikation att detta aldrig blir bra

Exempel Riksdagens person-GUID P8388 - a9b3d62e-7665-47bd-98b0-449545cc6c05

KLICKA PÅ BIOGRAFI

image image image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants