Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
anketaube authored Feb 22, 2024
1 parent 104361b commit 5854226
Showing 1 changed file with 142 additions and 0 deletions.
142 changes: 142 additions & 0 deletions content/DNB_SRU_Tutorial_Workshop.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
{
"metadata": {
"kernelspec": {
"name": "python",
"display_name": "Python (Pyodide)",
"language": "python"
},
"language_info": {
"codemirror_mode": {
"name": "python",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8"
},
"anaconda-cloud": {}
},
"nbformat_minor": 4,
"nbformat": 4,
"cells": [
{
"cell_type": "markdown",
"source": "# DNBLab Jupyter Notebook zur SRU Abfrage",
"metadata": {
"nbpresent": {
"id": "d0d72cd5-034d-4b94-a4e2-54f7753cb9f0"
}
}
},
{
"cell_type": "markdown",
"source": "## SRU - Schnittstellenabfrage und Datenauslieferung",
"metadata": {
"nbpresent": {
"id": "558898c3-e162-4e46-92d5-b4c914325790"
}
}
},
{
"cell_type": "markdown",
"source": "Dieses DNBLab-Tutorial beschreibt eine Beispielabfrage über die SRU-Schnittstelle mit Python. In dem Jupyter Notebook kann der dokumentierte Code direkt ausgeführt und angepasst werden. Das Tutorial umfasst die exemplarische Abfrage und Ausgabe der Daten im Format MARC21-xml zur weiteren Verarbeitung und den Export der Ergebnisse als CSV-Datei. ",
"metadata": {
"nbpresent": {
"id": "ee78241c-36e7-45e6-b015-0ef46f1e777d"
}
}
},
{
"cell_type": "code",
"source": "import urllib.parse\nfrom pyodide.http import open_url, pyfetch\nfrom js import fetch\nfrom bs4 import BeautifulSoup as soup\nimport unicodedata\nfrom lxml import etree\nimport pandas as pd",
"metadata": {
"trusted": true
},
"outputs": [],
"execution_count": 1
},
{
"cell_type": "markdown",
"source": "## Arbeitsumgebung einrichten! <a class=\"anchor\" id=\"Teil1\"></a>",
"metadata": {
"nbpresent": {
"id": "3c6f6024-cb7c-4b65-8646-a753069cb5a4"
},
"tags": []
}
},
{
"cell_type": "markdown",
"source": "## Abfrage über alle Datensätze... <a class=\"anchor\" id=\"Teil2\"></a>",
"metadata": {}
},
{
"cell_type": "code",
"source": "async def dnb_sru(query):\n \n base_url = \"https://services.dnb.de/sru/dnb\"\n params = {'recordSchema' : 'MARC21-xml',\n 'operation': 'searchRetrieve',\n 'version': '1.1',\n 'maximumRecords': '100',\n 'query': query\n }\n \n r = await fetch(base_url + \"?\" + urllib.parse.urlencode(params)) \n r_text = await r.text()\n xml = soup(r_text, features=\"xml\")\n records = xml.find_all('record', {'type':'Bibliographic'})\n \n \n if len(records) < 100:\n \n return records\n \n else:\n \n num_results = 100\n i = 101\n while num_results == 100:\n \n params.update({'startRecord': i})\n r = await fetch(base_url + \"?\" + urllib.parse.urlencode(params)) \n r_text = await r.text()\n xml = soup(r_text, features=\"xml\")\n new_records = xml.find_all('record', {'type':'Bibliographic'})\n records+=new_records\n i+=100\n num_results = len(new_records)\n \n return records",
"metadata": {
"trusted": true
},
"outputs": [],
"execution_count": 2
},
{
"cell_type": "markdown",
"source": "## Felder Titel und Links zu den Objekten durchsuchen... <a class=\"anchor\" id=\"Teil3\"></a>",
"metadata": {}
},
{
"cell_type": "code",
"source": "def parse_record(record):\n \n ns = {\"marc\":\"http://www.loc.gov/MARC21/slim\"}\n xml = etree.fromstring(unicodedata.normalize(\"NFC\", str(record)))\n \n #idn\n idn = xml.xpath(\"marc:controlfield[@tag = '001']\", namespaces=ns)\n try:\n idn = idn[0].text\n except:\n idn = 'fail'\n \n # link\n link = xml.xpath(\"marc:datafield[@tag = '856']/marc:subfield[@code = 'u']\", namespaces=ns)\n \n try:\n link = link[0].text\n #titel = unicodedata.normalize(\"NFC\", titel)\n except:\n link = \"unkown\"\n \n # titel\n titel = xml.xpath(\"marc:datafield[@tag = '245']/marc:subfield[@code = 'a']\", namespaces=ns)\n \n try:\n titel = titel[0].text\n #titel = unicodedata.normalize(\"NFC\", titel)\n except:\n titel = \"unkown\"\n \n \n meta_dict = {\"idn\":idn,\n \"titel\":titel,\n \"link\":link}\n \n return meta_dict",
"metadata": {
"trusted": true
},
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"source": "## Abfrage nach Suchwort \"Pandemie\" im Titel + Links zu frei verfügbaren Objekten!",
"metadata": {}
},
{
"cell_type": "code",
"source": "records = await dnb_sru('tit=Pandemie and location=onlinefree')\nprint(len(records), 'Ergebnisse')",
"metadata": {
"trusted": true
},
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"source": "## Anzeige der Treffer in einer Tabelle... <a class=\"anchor\" id=\"Teil4\"></a>",
"metadata": {}
},
{
"cell_type": "code",
"source": "output = [parse_record(record) for record in records]\ndf = pd.DataFrame(output)\ndf",
"metadata": {
"trusted": true
},
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"source": "## Speichern der Ergebnisse als CSV-Datei!",
"metadata": {}
},
{
"cell_type": "code",
"source": "df.to_csv(\"SRU_Titel.csv\", index=False)",
"metadata": {
"trusted": true
},
"outputs": [],
"execution_count": 18
}
]
}

0 comments on commit 5854226

Please sign in to comment.