Announcement Archives - DBpedia Association https://www.dbpedia.org/announcement/ Global and Unified Access to Knowledge Graphs Fri, 01 Mar 2024 09:21:24 +0000 en-GB hourly 1 https://wordpress.org/?v=6.4.3 https://www.dbpedia.org/wp-content/uploads/2020/09/cropped-dbpedia-webicon-32x32.png Announcement Archives - DBpedia Association https://www.dbpedia.org/announcement/ 32 32 GSoC 2024 – Call for Contributors https://www.dbpedia.org/blog/gsoc-2024-call-for-contributors/ https://www.dbpedia.org/blog/gsoc-2024-call-for-contributors/#respond Fri, 01 Mar 2024 09:21:00 +0000 https://www.dbpedia.org/?p=5717 Are you a student looking for a summer experience that combines coding skills with open source development? Then look no further than the Google Summer of Code program 2024, where you can join forces with DBpedia to help advance the state of the art in semantic web technologies. Build your skills and gain valuable experience […]

The post GSoC 2024 – Call for Contributors appeared first on DBpedia Association.

]]>
Are you a student looking for a summer experience that combines coding skills with open source development? Then look no further than the Google Summer of Code program 2024, where you can join forces with DBpedia to help advance the state of the art in semantic web technologies. Build your skills and gain valuable experience while making a real impact on the tech community!

We have been accepted to be part of this incredible program to support young ambitious developers who want to work with open-source organizations like DBpedia. So far, each year has brought us new project ideas, many amazing students and great project results that shaped the future of DBpedia. Even though Covid-19 changed a lot in the world, it couldn’t shake Google Summer of Code (GSoC) much. The program, designed to mentor youngsters from afar is almost too perfect for us. One of the advantages of GSoC is, especially in times like these, the chance to work on projects remotely, but still obtain a first deep dive into Open Source projects like us.

DBpedia is now looking for contributors who want to work with us during the upcoming summer months.  

What is Google Summer of Code?

Google Summer of Code is a global program focused on bringing developers into open source software development. Funds will be given to all new beginner contributors to open source over 18 years to work for two and a half months (or longer) on a specific task. For GSoC-Newbies, this short video and the information provided on their website will explain all there is to know about GSoC2024.

And this is how it works …

Step 1Check out one of our projects here or draft your own. 
Step 2Get in touch with our mentors as soon as possible and write up a project proposal of at least 8 pages. Information about our proposal structure and a template are available here.  
Step 3After a selection phase, contributors are matched with a specific project and mentor(s) and start working on the project. 

Application Procedure GSoC2024

Further information on the application procedure is available in our DBpedia Guidelines. There you will find information on how to contact us and how to appropriately apply for GSoC2024. Please also note the official GSoC 2024 timeline for your proposal submission and make sure to submit on time. Unfortunately, extensions cannot be granted. Final submission deadline is April 2, 2024 at 18:00 UTC.

Contact

Detailed information on how to apply are available on the DBpedia website. We’ve prepared an information kit for you. Please find all necessary information regarding the student application procedure here.

And in case you still have questions, please do not hesitate to contact us via dbpedia@infai.org.

Stay safe and check Twitter or LinkedIn. Furthermore, you can subscribe to our Newsletter for the latest news and information around DBpedia.

Finally, we are looking forward to your contribution!

Yours DBpedia Association

The post GSoC 2024 – Call for Contributors appeared first on DBpedia Association.

]]>
https://www.dbpedia.org/blog/gsoc-2024-call-for-contributors/feed/ 0
A year with DBpedia – Retrospective Part 2/2023 https://www.dbpedia.org/blog/a-year-with-dbpedia-retrospective-part-2-2023/ Thu, 04 Jan 2024 13:45:24 +0000 https://www.dbpedia.org/?p=5672 This is the final part of our journey through 2023. In the previous blog post we have presented the DBpedia highlights. Now we will take a look at the second half of 2023 and give an outlook for 2024. Tutorial @  Language, Data and Knowledge conference On 13th of September, 2023, an exciting tutorial took […]

The post A year with DBpedia – Retrospective Part 2/2023 appeared first on DBpedia Association.

]]>
This is the final part of our journey through 2023. In the previous blog post we have presented the DBpedia highlights. Now we will take a look at the second half of 2023 and give an outlook for 2024.

Tutorial @  Language, Data and Knowledge conference

On 13th of September, 2023, an exciting tutorial took place at the University of Vienna in the Center for Translation Studies as part of the LDK 2023. The LDK conference focuses on the acquisition, maintenance and use of language data in the context of data science and knowledge-based applications. The tutorial was opened by Milan Dojchinovski (InfAI, DBpedia Association, CTU in Prague). This was followed by three sessions, which were accompanied by many real-world practical use cases, on the DBpedia Knowledge Graph, the infrastructure and the use of the databus data publishing platform. Check more details on our events page

DBpedia Day @ SEMANTiCS in Leipzig 

DBpedia Day was once again part of the program at this year’s SEMANTICS conference 2023. It was held on 20th of September at the HYPERION Hotel Leipzig with up to 100 DBpedians. Once again this year, our CEO Sebastian Hellmann opened the day with a presentation of the “DBpedia Databus version 2.1.0”. This was followed by the exciting keynote speech “Towards Foundation Models for Data Spaces” by Edward Curry from the University of Galway, Ireland. Afterwards, we organized the member session and the DBpedia Science Talk session. All slides can also be found on our  events page.

Databus

Databus pre-launch announcement

We are in the final stage of the DBpedia Databus open software release (GitHub). Remaining issues include quality of life and UI improvements. Check out the Databus feature matrix for our lightweight, scalable, adaptable, powerful Data Catalog Platform (direct download link, persistent data identifier on the databus). Contact dbpedia@infai.org for demo, business, or research proposal inquiries.

Databus excels at cataloging de-central data of any filetype using RDF/DCAT. We selected a few initial focal use cases, where the Databus serves as:

  1. AIModelHub for AI training data, models, validation, and deployment.
  2. Research Data Management Catalog for research institutes and communities.
  3. Supply-Chain-Management Platform for product information collection along the supply chain and construction of Digital Product Passports.
  4. Community Data Portal, e.g., for the DBpedia Community.

DBpedia Contributions will be enabled soon, taking DBpedia to the moon! 🚀

In DBpedia’s future, the Databus will be used to collect community contributions more effectively, giving DBpedia an enormous boost in quantity and quality. https://databus.dbpedia.org already catalogs over 350k files with over 1 Million file downloads per month!  We are preparing showcases, templates, and documentation for these community contribution types:

  1. Community Extensions such as caligraph.org or AI-improved abstracts.
  2. Community Link Contributions for inclusion in the main graph.
  3. RDF profiles for DBpedia Users and Members (FOAF, Schema.org, WebID) via Databus Accounts (including publication of expertise).
  4. Dockerized RDF Tool Deployment so you can automatically load DBpedia and other RDF data into your favorite RDF tools via Databus collections. Our Databus-powered Virtuoso SPARQL Endpoint Quickstart Docker has already been deployed over 150k times! 

We do hope we will meet you and some new faces during our events next year. The association wants to get to know you because DBpedia is a community effort and would not continue to develop, improve and grow without you. We plan to have a tutorial at the LREC-COLING 2024 conference and a meeting at SEMANTiCS, Sep 17-19, 2024, conference in Amsterdam, Netherlands.

Stay safe and check Twitter, Instagram and LinkedIn or or subscribe to our Newsletter for the latest news and information.

Yours,

Julia & Maria

on behalf of the DBpedia Association

The post A year with DBpedia – Retrospective Part 2/2023 appeared first on DBpedia Association.

]]>
Recap 2023: A Year with DBpedia https://www.dbpedia.org/blog/recap-2023-a-year-with-dbpedia/ Mon, 04 Dec 2023 11:50:24 +0000 https://www.dbpedia.org/?p=5663 Can you believe it..? … sixteen years ago the first DBpedia dataset was released. Sixteen years of development, improvements and growth. Now more than 4,100 GByte of data is hosted on the DBpedia Databus. We want to take this as an opportunity to send out a big “Thank you!” to all contributors, developers, members, hosters, […]

The post Recap 2023: A Year with DBpedia appeared first on DBpedia Association.

]]>

Can you believe it..? … sixteen years ago the first DBpedia dataset was released. Sixteen years of development, improvements and growth. Now more than 4,100 GByte of data is hosted on the DBpedia Databus. We want to take this as an opportunity to send out a big “Thank you!” to all contributors, developers, members, hosters, funders, believers and DBpedia enthusiasts who made that possible. Thank you for your support!

In the upcoming blog series, we will take you on a retrospective tour through 2023. Furthermore, we will give you insights into a year with DBpedia. In the following we will also highlight our past events. 

Snapshot Release

We are pleased to announce immediate availability of a new edition of the free and publicly accessible SPARQL Query Service Endpoint and Linked Data Pages, for interacting with the new Snapshot Dataset. In 2023, we released version 2022-12 release with all the features since version 2022-09. The current Snapshot Release contains more than 850 million facts (triples). Please check more details on our website.

Google Summer of Code (GSoC)

For the 12th year in a row, we have been able to support and guide young, ambitious developers who have joined us as an open source organization. We encouraged them to work on a programming project this summer. Each year we have been inspired by new project ideas, many amazing contributors, and mostly great project results that have shaped the future of DBpedia. If you want to have deeper insights in our GSoC contributors work you can find their blogs and repos on the DBpedia blog.

DBpedia @ Leipzig Semantic Web Day

On June 28, 2023, Sebastian Hellmann presented the DBpedia Databus 2.1. at Data Week Leipzig. Data Week is the networking and exchange event for highlighting scientific, economic, and social perspectives of data and its use, where industry, citizens, science, and public authorities can enter into dialogue. Data Week Leipzig took place June 26-30, 2023. Please find Sebastian’s slides here.

In the upcoming blog post after the holidays we will give you more insights in the past events and technical achievements. We are now looking forward to the year 2024. The DBpedia team plans to have a tutorial at the LREC-COLING 2024 conference and the DBpedia Day at SEMANTiCS 2024 conference in Amsterdam, Netherlands. 

Above all, we wish you a merry Christmas and a happy new year. In the meantime, stay tuned and check our Twitter, Instagram or LinkedIn channels. You can subscribe to our Newsletter for the latest news and information around DBpedia.

Julia & Maria,   

on behalf of the DBpedia Association

The post Recap 2023: A Year with DBpedia appeared first on DBpedia Association.

]]>
Retrospective 2023 – Half a year with DBpedia https://www.dbpedia.org/blog/retrospective-2023-half-a-year-with-dbpedia/ Tue, 04 Jul 2023 11:10:25 +0000 https://www.dbpedia.org/?p=5625 Already, half of the year 2023 has passed by. Time for us to look back on the past half year. What have we achieved? What still lies ahead of us? In the following, we will take you on a retrospective tour through the first half of 2023. We will highlight our past events and the […]

The post Retrospective 2023 – Half a year with DBpedia appeared first on DBpedia Association.

]]>
Already, half of the year 2023 has passed by. Time for us to look back on the past half year. What have we achieved? What still lies ahead of us? In the following, we will take you on a retrospective tour through the first half of 2023. We will highlight our past events and the development around the DBpedia dataset. Have fun reading!

DBpedia is part of the Google Summer of Code project 2023

So far, each year has brought us new project ideas, many amazing students and great project results that shaped the future of DBpedia. Like every year, we received many fantastic applications this year. Out of these applications 6 great projects from contributors all over the world were selected to work together with our mentors. Right now the contributors are in the middle of the coding phase. If you want to know more about this year’s projects go and have a look at the DBpedia blog.

DBpedia Snapshot 2022-12 Release

We are pleased to announce immediate availability of a new edition of the free and publicly accessible Sparql Query Service Endpoint and Linked Data pages, for interacting with the new Snapshot Dataset. Check our blog!  

Leipzig Semantic Web Day

On June 28, 2023, Sebastian Hellmann presented the DBpedia Databus 2.1. at Data Week Leipzig. Data Week is the networking and exchange event for highlighting scientific, economic, and social perspectives of data and its use, where industry, citizens, science, and public authorities can enter into dialogue. Data Week Leipzig took place June 26-30, 2023. Please find Sebastian’s slides here

What Will the Future Bring?

We are now looking forward to the LDK conference, which will take place September 12-15, 2023, in Vienna, Austria. Will will organize a tutorial on September 13, 2023. If you would like to join, please check more details on our event page. After that, we’ll fly straight back to Leipzig, because the Semantics Conference will be held at the Hyperion Hotel Leipzig from September 20 to 22, 2023. At the beginning of the conference, we will host the DBpedia Day on September 20, 2023.

Stay safe and check Twitter or LinkedIn. Furthermore, you can subscribe to our newsletter for the latest news and information around DBpedia.

Julia

on behalf of the DBpedia Association

The post Retrospective 2023 – Half a year with DBpedia appeared first on DBpedia Association.

]]>
DBpedia Snapshot 2022-12 Release https://www.dbpedia.org/blog/dbpedia-snapshot-2022-12-release/ Mon, 27 Mar 2023 09:36:32 +0000 https://www.dbpedia.org/?p=5585 We are pleased to announce immediate availability of a new edition of the free and publicly accessible SPARQL Query Service Endpoint and Linked Data Pages, for interacting with the new Snapshot Dataset.  News since DBpedia Snapshot 2022-09 Work in progress: Smoothing the community issue reporting and fixing at Github What is the “DBpedia Snapshot” Release? […]

The post DBpedia Snapshot 2022-12 Release appeared first on DBpedia Association.

]]>
We are pleased to announce immediate availability of a new edition of the free and publicly accessible SPARQL Query Service Endpoint and Linked Data Pages, for interacting with the new Snapshot Dataset. 

News since DBpedia Snapshot 2022-09

  • New Abstract Extractor due to GSOC 2022 (credits to Celian Ringwald) 

Work in progress: Smoothing the community issue reporting and fixing at Github

What is the “DBpedia Snapshot” Release?

Historically, this release has been associated with many names: “DBpedia Core”, “EN DBpedia”, and — most confusingly — just “DBpedia”. In fact, it is a combination of —

  • EN Wikipedia data — A small, but very useful, subset (~ 1 Billion triples or 14%) of the whole DBpedia extraction using the DBpedia Information Extraction Framework (DIEF), comprising structured information extracted from the English Wikipedia plus some enrichments from other Wikipedia language editions, notably multilingual abstracts in ar, ca, cs, de, el, eo, es, eu, fr, ga, id, it, ja, ko, nl, pl, pt, sv, uk, ru, zh.
  • Links — 62 million community-contributed cross-references and owl:sameAs links to other linked data sets on the Linked Open Data (LOD) Cloud that allow to effectively find and retrieve further information from the largest,  decentral, change-sensitive knowledge graph on earth that has formed around DBpedia since 2007. 
  • Community extensions — Community-contributed extensions such as additional ontologies and taxonomies. 

Release Frequency & Schedule

Going forward, releases will be scheduled for the 1th of February, May, August, and November (with +/- 5 days tolerance), and are named using the same date convention as the Wikipedia Dumps that served as the basis for the release. An example of the release timeline is shown below: 

December 6–8 December 8–20Dec 20–Jan 1Jan 1–Feb 15
Wikipedia dumps for June 1 become available on https://dumps.wikimedia.org/Download and extraction with DIEFPost-processing and quality-control periodLinked Data and SPARQL endpoint deployment 

Data Freshness

Given the timeline above, the EN Wikipedia data of DBpedia Snapshot has a lag of 1-4 months. We recommend the following strategies to mitigate this:

  1. DBpedia Snapshot as a kernel for Linked Data: Following the Linked Data paradigm, we recommend using the Linked Data links to other knowledge graphs to retrieve high-quality and recent information. DBpedia’s network consists of the best knowledge engineers in the world, working together, using linked data principles to build a high-quality, open, decentralized knowledge graph network around DBpedia. Freshness and change-sensitivity are two of the greatest data-related challenges of our time, and can only be overcome by linking data across data sources. The “Big Data” approach of copying data into a central warehouse is inevitably challenged by issues such as co-evolution and scalability. 
  2. DBpedia Live: Wikipedia is unmistakenly the richest, most recent body of human knowledge and source of news in the world. DBpedia Live is just minutes behind edits on Wikipedia,  which means that as soon as any of the 120k Wikipedia editors press the “save” button, DBpedia Live will extract fresh data and update.  DBpedia Live consists of the DBpedia Live Sync API (for syncing into any kind of on-site databases), Linked Data and SPARQL endpoint.
  3. Latest-Core is a dynamically updating Databus Collection. Our automated extraction robot “MARVIN” publishes monthly dev versions of the full extraction, which are then refined and enriched to become Snapshot.      

Data Quality & Richness

We would like to acknowledge the excellent work of Wikipedia editors (~46k active editors for EN Wikipedia), who are ultimately responsible for collecting information in Wikipedia’s infoboxes, which are refined by DBpedia’s extraction into our knowledge graphs. Wikipedia’s infoboxes are steadily growing each month and according to our measurements grow by 150% every three years. EN Wikipedia’s inboxes even doubled in this timeframe. This richness of knowledge drives the DBpedia Snapshot knowledge graph and is further potentiated by synergies with linked data cross-references. Statistics are given below

Data Access & Interaction Options

Linked Data

Linked Data is a principled approach to publishing RDF data on the Web that enables interlinking data between different data sources, courtesy of the built-in power of Hyperlinks as unique Entity Identifiers.


HTML pages comprising Hyperlinks that confirm to Linked Data Principles is one of the methods of interacting with data provided by the DBpedia Snapshot, be it manually via the web browser or programmatically using REST interaction patterns via https://dbpedia.org/resource/{entity-label} pattern. Naturally, we encourage Linked Data interactions, while also expecting user-agents to honor the cache-control HTTP response header for massive crawl operations. Instructions for accessing Linked Data, available in 10 formats.

SPARQL Endpoint

This service enables some astonishing queries against Knowledge Graphs derived from Wikipedia content. The Query Services Endpoint that makes this possible is identified by http://dbpedia.org/sparql, and it currently handles 7.2 million queries daily on averageSee powerful queries and instructions (incl. rates and limitations).

An effective Usage Pattern is to filter a relevant subset of entity descriptions for your use case via SPARQL and then combine with the power of Linked Data by looking up (or de-referencing) data via owl:sameAs property links en route to retrieving specific and recent data from across other Knowledge Graphs across the massive Linked Open Data Cloud.

Additionally, DBpedia Snapshot dumps and additional data from the complete collection of datasets derived from Wikipedia are provided by the DBpedia Databus for use in your own SPARQL-accessible Knowledge Graphs.

DBpedia Ontology

This Snapshot Release was built with DBpedia Ontology (DBO) version: https://databus.dbpedia.org/ontologies/dbpedia.org/ontology–DEV/2021.11.08-124002 We thank all DBpedians for the contribution to the ontology and the mappings. See documentation and visualizations, class tree and properties, wiki.

DBpedia Snapshot Statistics

Overview. Overall the current Snapshot Release contains more than 850 million facts (triples).

At its core, the DBpedia ontology is the heart of DBpedia. Our community is continuously contributing to the DBpedia ontology schema and the DBpedia infobox-to-ontology mappings by actively using the DBpedia Mappings Wiki.

The current Snapshot Release utilizes a total of 55 thousand properties, whereas 1377 of these are defined by the DBpedia ontology.

Classes. Knowledge in Wikipedia is constantly growing at a rapid pace. We use the DBpedia Ontology Classes to measure the growth: Total number in this release (in brackets we give: a) growth to the previous release, which can be negative temporarily and b) growth compared to Snapshot 2016-10): 

  • Persons: 1792308 (1.01%, 1.13%)
  • Places: 748372 (1.00%, 1820.86%), including but not limited to 590481 (1.00%, 5518.51%) populated places
  • Works 610589 (1.00%, 619.89%), including, but not limited to
    • 157566 (1.00%, 1.38%) music albums
    • 144415 (1.01%, 15.94%) films
    • 24829 (1.01%, 12.53%) video games
  • Organizations: 345523 (1.01%, 109.31%), including but not limited to
    • 87621 (1.01%, 2.25%) companies
    • 64507 (1.00%, 64507.00%) educational institutions
  • Species: 1933436 (1.01%, 322239.33%)
  • Plants: 7718 (0.82%, 1.71%)
  • Diseases: 10591 (1.00%, 8.54%)

Detailed Growth of Classes: The image below shows the detailed growth for one class. Click on the links for other classes: Place, PopulatedPlace, Work, Album, Film, VideoGame, Organisation, Company, EducationalInstitution, Species, Plant, Disease. For further classes adapt the query by replacing the <http://dbpedia.org/ontology/CLASS> URI. Note, that 2018 was a development phase with some failed extractions. The stats were generated with the Databus VOID Mod.

Links. Linked Data cross-references between decentral datasets are the foundation and access point to the Linked Data Web. The latest Snapshot Release provides over 130.6 million links from 7.62 million entities to 179 external sources.

Top 11

###TOP11###

33,975305 http://www.wikidata.org 

  7,206,254 https://global.dbpedia.org 

  4,308,772 http://yago-knowledge.org 

  3,855,108 http://de.dbpedia.org 

  3,731,002 http://fr.dbpedia.org 

  2,991,921 http://viaf.org 

  2,929,808 http://it.dbpedia.org 

  2,925,530 http://es.dbpedia.org 

  2,788,703 http://fa.dbpedia.org 

  2,587,004 http://ru.dbpedia.org 

  2,580,398 http://sr.dbpedia.org 

Top 10 without DBpedia namespaces

###TOP10###

33,975,305 http://www.wikidata.org 

  4,308,772 http://yago-knowledge.org 

  2,991,921 http://viaf.org

  1,708,533 http://d-nb.info 

     612,227 http://sws.geonames.org 

     596,134 http://umbel.org 

     537,602 http://data.bibliotheken.nl 

     430,839 http://www.w3.org 

     422,989 http://musicbrainz.org 

     104,433 http://linkedgeodata.org 

DBpedia Extraction Dumps on the Databus

All extracted files are reachable via the DBpedia account on the Databus. The Databus has two main structures:

Snapshot Download. For downloading DBpedia Snapshot, we prepared this collection, which also includes detailed releases notes: 

https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-03

The collection is roughly equivalent to http://downloads.dbpedia.org/2016-10/core/

Collections can be downloaded in many different ways, some download modalities such as bash script, SPARQL, and plain URL list are found in the tabs at the collection. Files are provided as bzip2 compressed n-triples files. In case you need a different format or compression, you can also use the “Download-As” function of the Databus Client (GitHub), e.g. -s $collection -c gzip would download the collection and convert it to GZIP during download. 

Replicating DBpedia Snapshot on your server can be done via Docker, see https://hub.docker.com/r/dbpedia/virtuoso-sparql-endpoint-quickstart 

git clone https://github.com/dbpedia/virtuoso-sparql-endpoint-quickstart.git

cd virtuoso-sparql-endpoint-quickstart

COLLECTION_URI=https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-09 VIRTUOSO_ADMIN_PASSWD=password docker-compose up

Download files from the whole DBpedia extraction. The whole extraction consists of approx. 20 Billion triples and 5000 files created from 140 languages of Wikipedia, Commons  and Wikidata. They can be found in https://databus.dbpedia.org/dbpedia/(generic|mappings|text|wikidata

You can copy-edit a collection and create your own customized (e.g.) collections via “Actions” -> “Copy Edit” , e.g. you can Copy Edit the snapshot collection above, remove some files that you do not need and add files from other languages. Please see the Rhizomer use case: Best way to download specific parts of DBpedia. Of course, this only refers to the archived dumps on the Databus for users who want to bulk download and deploy into their own infrastructure. Linked Data and SPARQL allow for filtering the content using a small data pattern.  

Acknowledgments

First and foremost, we would like to thank our open community of knowledge engineers for finding & fixing bugs and for supporting us by writing data tests. We would also like to acknowledge the DBpedia Association members for constantly innovating the areas of knowledge graphs and linked data and pushing the DBpedia initiative with their know-how and advice. OpenLink Software supports DBpedia by hosting SPARQL and Linked Data; University Mannheim, the German National Library of Science and Technology (TIB) and the Computer Center of University Leipzig provide persistent backups and servers for extracting data. We thank Marvin Hofer and Mykola Medynskyi for technical preparation. This work was partially supported by grants from the Federal Ministry for Economics and Climate Action (BMWK) for the LOD-GEOSS Project (03EI1005E), PenFLaaS (100594042) as well as for the PLASS Project (01MD19003D).

The post DBpedia Snapshot 2022-12 Release appeared first on DBpedia Association.

]]>
GSoC2023 – Call for Contributors https://www.dbpedia.org/blog/gsoc2023-call-for-contributors/ Thu, 02 Mar 2023 10:02:40 +0000 https://www.dbpedia.org/?p=5577 Are you a student looking for a summer experience that combines coding skills with open source development? Then look no further than the Google Summer of Code program 2023, where you can join forces with DBpedia to help advance the state of the art in semantic web technologies. Build your skills and gain valuable experience […]

The post GSoC2023 – Call for Contributors appeared first on DBpedia Association.

]]>
Are you a student looking for a summer experience that combines coding skills with open source development? Then look no further than the Google Summer of Code program 2023, where you can join forces with DBpedia to help advance the state of the art in semantic web technologies. Build your skills and gain valuable experience while making a real impact on the tech community!

For the 12th year in a row, we have been accepted to be part of this incredible program to support young ambitious developers who want to work with open-source organizations like DBpedia. So far, each year has brought us new project ideas, many amazing students and great project results that shaped the future of DBpedia. Even though Covid-19 changed a lot in the world, it couldn’t shake Google Summer of Code (GSoC) much. The program, designed to mentor youngsters from afar is almost too perfect for us. One of the advantages of GSoC is, especially in times like these, the chance to work on projects remotely, but still obtain a first deep dive into Open Source projects like us.

DBpedia is now looking for contributors who want to work with us during the upcoming summer months.  

What is Google Summer of Code?

Google Summer of Code is a global program focused on bringing developers into open source software development. Funds will be given to all new beginner contributors to open source over 18 years to work for two and a half months (or longer) on a specific task. For GSoC-Newbies, this short video and the information provided on their website will explain all there is to know about GSoC2023.

And this is how it works …

Step 1Check out one of our projects here or draft your own. 
Step 2Get in touch with our mentors as soon as possible and write up a project proposal of at least 8 pages. Information about our proposal structure and a template are available here.  
Step 3After a selection phase, contributors are matched with a specific project and mentor(s) and start working on the project. 

Application Procedure GSoC2023

Further information on the application procedure is available in our DBpedia Guidelines. There you will find information on how to contact us and how to appropriately apply for GSoC2023. Please also note the official GSoC 2023 timeline for your proposal submission and make sure to submit on time. Unfortunately, extensions cannot be granted. Final submission deadline is April 4, 2023 at 18:00 UTC.

Contact

Detailed information on how to apply are available on the DBpedia website. We’ve prepared an information kit for you. Please find all necessary information regarding the student application procedure here.

And in case you still have questions, please do not hesitate to contact us via dbpedia@infai.org.

Stay safe and check Twitter or LinkedIn. Furthermore, you can subscribe to our Newsletter for the latest news and information around DBpedia.

Finally, we are looking forward to your contribution!

Yours DBpedia Association

The post GSoC2023 – Call for Contributors appeared first on DBpedia Association.

]]>
A year with DBpedia – Retrospective Part 2/2022 https://www.dbpedia.org/blog/a-year-with-dbpedia-retrospective-part-2-2022/ Thu, 12 Jan 2023 09:59:46 +0000 https://www.dbpedia.org/?p=5550 This is the final part of our journey through 2022. In the previous blog post we already presented DBpedia highlights, events and tutorials. Now we want to take a look at the second half of 2022 and give an outlook for 2023. DBpedia Day @ Semantics Conference in Vienna Like last year, the DBpedia Day […]

The post A year with DBpedia – Retrospective Part 2/2022 appeared first on DBpedia Association.

]]>
This is the final part of our journey through 2022. In the previous blog post we already presented DBpedia highlights, events and tutorials. Now we want to take a look at the second half of 2022 and give an outlook for 2023.

DBpedia Day @ Semantics Conference in Vienna

Like last year, the DBpedia Day was part of the SEMANTiCS conference 2022 and was held on 13th of September at the ARCOTEL Wimberger Wien. Up to 100 DBpedians joined the DBpedia Day. Also this year, our CEO Sebastian Hellmann opened DBpedia Day by presenting the Linkmaster 3000 project. Afterwards, Olaf Harting from Linköping University gave his fantastic keynote presentation “Towards Querying Heterogeneous Federations of Interlinked Knowledge Graphs”. Furthermore, we organized a member presentation session, DBpedia Science: Linking and Consumption and a community session. In case you missed the event, all slides are also available on our event page or read our blogpost.

Linkmaster 3000

If it’s not linked — does it even exist? On September 13, 2022, Sebastian Hellmann introduced the Linkmaster 3000 at the SEMANTiCS Conference in Vienna, Austria. It’s an online tool in development to manage Linked Data spaces and help make them better integrated in the global database formed by Linked Data principles. Two milestones are pending: a large-scale evaluation and creation of a documentation for the launch. 

New DBpedia Member

Since 2007, our DBpedia family has been growing steadily – and this year is no exception. We are once again pleased to welcome organizations from science and industry to the DBpedia family. Beginning in February, we were able to welcome Anhalt University of Applied Science into our midst. This was followed in June by the Leipzig University of Applied Sciences (HTWK Leipzig) and the Technical University of Madrid. And last but not least, DALICC joined the DBpedia family in November and has grown it to 34 members.

Right now we are excited to see who will become a new member this year and we are already looking forward to it.

DBpedia Tutorial @ the KGSWC 2022

On November 17, 2022, we organized a free DBpedia Knowledge Graph tutorial at the Knowledge Graph and Semantic Web Confernece (KGSWC) 2022. In this framework this year’s International Winter School was held under the theme “KnowledgeGraphs: Third Wave of AI” in Madrid, Spain. Around 33 participants joined the tutorial, which was organized by Milan Dojchinovski and Jan Forberg. If you want to know more about the tutorial you can find the slides here https://tinyurl.com/WinterSchool22

DBpedia Snapshot 2022-09 Release

On October 28, 2022 we announced the immediate availability of a new edition of the free and publicly accessible SPARQL Query Service Endpoint and Linked Data Pages, for interacting with the new Snapshot Dataset. Since the last release we made a few changes. Due to GSoC 2022 (credits to Celian Ringwald) it includes the New Abstract Extractor. In Addition there is still work in progress in order of smoothing the community issue reporting and fixing at Github. The full release description including further statistics can be found on https://www.dbpedia.org/blog/dbpedia-snapshot-2022-09-release/.    

We do hope we will meet you and some new faces during our events next year. The DBpedia Association wants to get to know you because DBpedia is a community effort and would not continue to develop, improve and grow without you. We plan to have meetings or tutorials at the Data Week in Leipzig, the LDK conference, and SEMANTiCS’23 conference. We wish you a happy New Year!

Stay safe and check Twitter, Instagram and LinkedIn or or subscribe to our Newsletter for the latest news and information.

Yours,

Emma & Julia

on behalf of the DBpedia Association

The post A year with DBpedia – Retrospective Part 2/2022 appeared first on DBpedia Association.

]]>
DBpedia Snapshot 2022-09 Release https://www.dbpedia.org/blog/dbpedia-snapshot-2022-09-release/ Fri, 28 Oct 2022 12:28:10 +0000 https://www.dbpedia.org/?p=5483 We are pleased to announce immediate availability of a new edition of the free and publicly accessible SPARQL Query Service Endpoint and Linked Data Pages, for interacting with the new Snapshot Dataset. News since DBpedia Snapshot 2022-03 New Abstract Extractor due to GSOC 2022 (credits to Celian Ringwald) Work in progress: Smoothing the community issue […]

The post DBpedia Snapshot 2022-09 Release appeared first on DBpedia Association.

]]>
We are pleased to announce immediate availability of a new edition of the free and publicly accessible SPARQL Query Service Endpoint and Linked Data Pages, for interacting with the new Snapshot Dataset.

News since DBpedia Snapshot 2022-03

  • New Abstract Extractor due to GSOC 2022 (credits to Celian Ringwald)
  • Work in progress: Smoothing the community issue reporting and fixing at Github

What is the “DBpedia Snapshot” Release?

Historically, this release has been associated with many names: “DBpedia Core”, “EN DBpedia”, and — most confusingly — just “DBpedia”. In fact, it is a combination of —

  • EN Wikipedia data — A small, but very useful, subset (~ 1 Billion triples or 14%) of the whole DBpedia extraction using the DBpedia Information Extraction Framework (DIEF), comprising structured information extracted from the English Wikipedia plus some enrichments from other Wikipedia language editions, notably multilingual abstracts in ar, ca, cs, de, el, eo, es, eu, fr, ga, id, it, ja, ko, nl, pl, pt, sv, uk, ru, zh.
  • Links — 62 million community-contributed cross-references and owl:sameAs links to other linked data sets on the Linked Open Data (LOD) Cloud that allow to effectively find and retrieve further information from the largest, decentral, change-sensitive knowledge graph on earth that has formed around DBpedia since 2007.
  • Community extensions — Community-contributed extensions such as additional ontologies and taxonomies.

Release Frequency & Schedule

Going forward, releases will be scheduled for the 1th of February, May, August, and November (with +/- 5 days tolerance), and are named using the same date convention as the Wikipedia Dumps that served as the basis for the release. An example of the release timeline is shown below:

September 6–8September 8–20September 20–November 1November 1 –November 15
Wikipedia dumps for June 1 become available on https://dumps.wikimedia.org/Download and extraction with DIEFPost-processing and quality-control periodLinked Data and SPARQL endpoint deployment

Data Freshness

Given the timeline above, the EN Wikipedia data of DBpedia Snapshot has a lag of 1-4 months. We recommend the following strategies to mitigate this:

  1. DBpedia Snapshot as a kernel for Linked Data: Following the Linked Data paradigm, we recommend using the Linked Data links to other knowledge graphs to retrieve high-quality and recent information. DBpedia’s network consists of the best knowledge engineers in the world, working together, using linked data principles to build a high-quality, open, decentralized knowledge graph network around DBpedia. Freshness and change-sensitivity are two of the greatest data-related challenges of our time, and can only be overcome by linking data across data sources. The “Big Data” approach of copying data into a central warehouse is inevitably challenged by issues such as co-evolution and scalability.
  2. DBpedia Live: Wikipedia is unmistakenly the richest, most recent body of human knowledge and source of news in the world. DBpedia Live is just minutes behind edits on Wikipedia,  which means that as soon as any of the 120k Wikipedia editors press the “save” button, DBpedia Live will extract fresh data and update. DBpedia Live is currently in tech preview status and we are working towards a high-available and reliable business API with support. DBpedia Live consists of the DBpedia Live Sync API (for syncing into any kind of on-site databases), Linked Data and SPARQL endpoint.
  3. Latest-Core is a dynamically updating Databus Collection. Our automated extraction robot “MARVIN” publishes monthly dev versions of the full extraction, which are then refined and enriched to become Snapshot.  

Data Quality & Richness

We would like to acknowledge the excellent work of Wikipedia editors (~46k active editors for EN Wikipedia), who are ultimately responsible for collecting information in Wikipedia’s infoboxes, which are refined by DBpedia’s extraction into our knowledge graphs. Wikipedia’s infoboxes are steadily growing each month and according to our measurements grow by 150% every three years. EN Wikipedia’s inboxes even doubled in this timeframe. This richness of knowledge drives the DBpedia Snapshot knowledge graph and is further potentiated by synergies with linked data cross-references. Statistics are given below.

Data Access & Interaction Options

Linked Data

Linked Data is a principled approach to publishing RDF data on the Web that enables interlinking data between different data sources, courtesy of the built-in power of Hyperlinks as unique Entity Identifiers.

HTML pages comprising Hyperlinks that confirm to Linked Data Principles is one of the methods of interacting with data provided by the DBpedia Snapshot, be it manually via the web browser or programmatically using REST interaction patterns via https://dbpedia.org/resource/{entity-label} pattern. Naturally, we encourage Linked Data interactions, while also expecting user-agents to honor the cache-control HTTP response header for massive crawl operations. Instructions for accessing Linked Data, available in 10 formats.

SPARQL Endpoint

This service enables some astonishing queries against Knowledge Graphs derived from Wikipedia content. The Query Services Endpoint that makes this possible is identified by http://dbpedia.org/sparql, and it currently handles 7.2 million queries daily on averageSee powerful queries and instructions (incl. rates and limitations).

An effective Usage Pattern is to filter a relevant subset of entity descriptions for your use case via SPARQL and then combine with the power of Linked Data by looking up (or de-referencing) data via owl:sameAs property links en route to retrieving specific and recent data from across other Knowledge Graphs across the massive Linked Open Data Cloud.

Additionally, DBpedia Snapshot dumps and additional data from the complete collection of datasets derived from Wikipedia are provided by the DBpedia Databus for use in your own SPARQL-accessible Knowledge Graphs.

DBpedia Ontology

This Snapshot Release was built with DBpedia Ontology (DBO) version: https://databus.dbpedia.org/ontologies/dbpedia.org/ontology–DEV/2021.11.08-124002 We thank all DBpedians for the contribution to the ontology and the mappings. See documentation and visualizations, class tree and properties, wiki.

DBpedia Snapshot Statistics

Overview. Overall the current Snapshot Release contains more than 850 million facts (triples).

At its core, the DBpedia ontology is the heart of DBpedia. Our community is continuously contributing to the DBpedia ontology schema and the DBpedia infobox-to-ontology mappings by actively using the DBpedia Mappings Wiki.

The current Snapshot Release utilizes a total of 55 thousand properties, whereas 1377 of these are defined by the DBpedia ontology.

Classes. Knowledge in Wikipedia is constantly growing at a rapid pace. We use the DBpedia Ontology Classes to measure the growth: Total number in this release (in brackets we give: a) growth to the previous release, which can be negative temporarily and b) growth compared to Snapshot 2016-10):

  • Persons: 1833500 (1.02%, 1.15%)
  • Places: 757436 (1.01%, 1842.91%), including but not limited to 597548 (1.01%, 5584.56%) populated places
  • Works 619280 (1.01%, 628.71%), including, but not limited to
    • 158895 (1.01%, 1.40%) music albums
    • 147192 (1.02%, 16.25%) films
    • 25182 (1.01%, 12.71%) video games
  • Organizations: 350329 (1.01%, 110.83%), including but not limited to
    • 88722 (1.01%, 2.28%) companies
    • 64094 (0.99%, 64094.00%) educational institutions
  • Species: 1955775 (1.01%, 325962.50%)
  • Plants: 4977 (0.64%, 1.10%)
  • Diseases: 10837 (1.02%, 8.74%)

Detailed Growth of Classes: The image below shows the detailed growth for one class. Click on the links for other classes: Place, PopulatedPlace, Work, Album, Film, VideoGame, Organisation, Company, EducationalInstitution, Species, Plant, Disease. For further classes adapt the query by replacing the <http://dbpedia.org/ontology/CLASS> URI. Note, that 2018 was a development phase with some failed extractions. The stats were generated with the Databus VOID Mod.

Links. Linked Data cross-references between decentral datasets are the foundation and access point to the Linked Data Web. The latest Snapshot Release provides over 130.6 million links from 7.62 million entities to 179 external sources.

Top 11

###TOP11###

33,860,047 http://www.wikidata.org

7,147,970 https://global.dbpedia.org 

4,308,772 http://yago-knowledge.org 

3,832,100 http://de.dbpedia.org 

3,704,534 http://fr.dbpedia.org 

2,971,751 http://viaf.org 

2,912,859 http://it.dbpedia.org 

2,903,130 http://es.dbpedia.org 

2,754,466 http://fa.dbpedia.org 

2,571,787 http://sr.dbpedia.org 

2,563,793 http://ru.dbpedia.org

Top 10 without DBpedia namespaces

###TOP10###

33,860,047 http://www.wikidata.org

4,308,772 http://yago-knowledge.org 

2,971,751 http://viaf.org 

1,687,386 http://d-nb.info 

609,604 http://sws.geonames.org 

596,134 http://umbel.org 

533,320 http://data.bibliotheken.nl 

430,839 http://www.w3.org 

417,034 http://musicbrainz.org 

104,433 http://linkedgeodata.org

DBpedia Extraction Dumps on the Databus

All extracted files are reachable via the DBpedia account on the Databus. The Databus has two main structures:

Snapshot Download. For downloading DBpedia Snapshot, we prepared this collection, which also includes detailed releases notes: https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-03

The collection is roughly equivalent to http://downloads.dbpedia.org/2016-10/core/

Collections can be downloaded in many different ways, some download modalities such as bash script, SPARQL, and plain URL list are found in the tabs at the collection. Files are provided as bzip2 compressed n-triples files. In case you need a different format or compression, you can also use the “Download-As” function of the Databus Client (GitHub), e.g. -s $collection -c gzip would download the collection and convert it to GZIP during download.

Replicating DBpedia Snapshot on your server can be done via Docker, see https://hub.docker.com/r/dbpedia/virtuoso-sparql-endpoint-quickstart

git clone https://github.com/dbpedia/virtuoso-sparql-endpoint-quickstart.git

cd virtuoso-sparql-endpoint-quickstart

COLLECTION_URI=https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-09

VIRTUOSO_ADMIN_PASSWD=password docker-compose up

Download files from the whole DBpedia extraction. The whole extraction consists of approx. 20 Billion triples and 5000 files created from 140 languages of Wikipedia, Commons and Wikidata. They can be found in https://databus.dbpedia.org/dbpedia/(generic|mappings|text|wikidata)

You can copy-edit a collection and create your own customized (e.g.) collections via “Actions” -> “Copy Edit” , e.g. you can Copy Edit the snapshot collection above, remove some files that you do not need and add files from other languages. Please see the Rhizomer use case: Best way to download specific parts of DBpedia. Of course, this only refers to the archived dumps on the Databus for users who want to bulk download and deploy into their own infrastructure. Linked Data and SPARQL allow for filtering the content using a small data pattern

Acknowledgments

First and foremost, we would like to thank our open community of knowledge engineers for finding & fixing bugs and for supporting us by writing data tests. We would also like to acknowledge the DBpedia Association members for constantly innovating the areas of knowledge graphs and linked data and pushing the DBpedia initiative with their know-how and advice. OpenLink Software supports DBpedia by hosting SPARQL and Linked Data; University Mannheim, the German National Library of Science and Technology (TIB) and the Computer Center of University Leipzig provide persistent backups and servers for extracting data. We thank Marvin Hofer and Mykola Medynskyi for technical preparation. This work was partially supported by grants from the Federal Ministry for Economic Affairs and Energy of Germany (BMWi) for the LOD-GEOSS Project (03EI1005E), as well as for the PLASS Project (01MD19003D).

The post DBpedia Snapshot 2022-09 Release appeared first on DBpedia Association.

]]>
Retrospective: Google Summer of Code 2022 https://www.dbpedia.org/blog/retrospective-google-summer-of-code-2022/ https://www.dbpedia.org/blog/retrospective-google-summer-of-code-2022/#respond Tue, 04 Oct 2022 08:30:40 +0000 https://www.dbpedia.org/?p=5467 We received 11 project proposals for this GSoC edition. For the 11th year in a row, we have been able to support and guide young, ambitious developers who joined us as an open source organization to work on a programming project over this summer. Each year we have been inspired by new project ideas, many amazing […]

The post Retrospective: Google Summer of Code 2022 appeared first on DBpedia Association.

]]>
We received 11 project proposals for this GSoC edition.

For the 11th year in a row, we have been able to support and guide young, ambitious developers who joined us as an open source organization to work on a programming project over this summer. Each year we have been inspired by new project ideas, many amazing students, and mostly great project results that have shaped the future of DBpedia. One of the advantages of Google Summer of Code 2022 is, especially in times like these, the chance to work on projects remotely, but still obtain a first deep dive into Open Source projects like us – DBpedia. 

Meet our Google Summer of Code 2022 contributors and their projects

Throughout the summer program, our five finalists worked intensely on their challenging DBpedia projects with great outcomes to show to the public. Projects ranged from extending a neural extraction framework to enhancing DBpedia with image-based querying. If you want to have deeper insights into our GSoC contributer’s work you can find their blogs and repos in the following list. Check them out! 

We started out with five contributors that committed to GSoC projects. However, in the course of the summer, one dropped out and did not pass the final evaluation. In the end, we had four finalists that made it through the program. If you are interested in the project “Understanding and Optimizing DBpedia Question Answering through Explanations”, please check GitHub

Thanks to mentors

Thanks to all our mentors around the world for joining us in this endeavour, for mentoring with kindness and technical expertise. A huge shout out to those who have been by our side for so many years in a row. Thank you all again for spending over 3.5+ months working with this year’s GSoC contributors and helping them become better open source contributors!

Mentor Summit

During the previous years you might have noticed that we always organized a little lottery to decide which mentor or organization admin can join the annual GSoC mentor summit. As this year’s event will take place online, space is open to all organization admins and mentors alike. The Google Summer of Code 2022 Virtual Mentor Summit takes place on November 4, 2022 from about 8am – 12pm PT. This year we hope all our mentors will find the time to join and exchange with fellow mentors from around dozens of open source projects. 

After GSoC is before the next GSoC

We can not wait for the 2023 edition. Likewise, if you are an ambitious student who is interested in open source development and working with DBpedia you are more than welcome to either contribute your own project idea or apply for project ideas we offer starting in early 2023. If you would like to know where previous mentors and contributors are now working, please read our GSoC blog post about the last 10 years of DBpedia at GSoC. 

In case you like to mentor a project do not hesitate to also get in touch with us via dbpedia@infai.org

Stay safe and check Twitter or LinkedIn. Furthermore, you can subscribe to our Newsletter for the latest news and information around DBpedia.

Emma & Julia

on behalf of the DBpedia Association

The post Retrospective: Google Summer of Code 2022 appeared first on DBpedia Association.

]]>
https://www.dbpedia.org/blog/retrospective-google-summer-of-code-2022/feed/ 0
DBpedia Snapshot 2022-06 Release https://www.dbpedia.org/blog/dbpedia-snapshot-2022-06-release/ Thu, 28 Jul 2022 10:57:09 +0000 https://www.dbpedia.org/?p=5403 The DBpedia team, heartbrokenly, has to inform you: that there will be NO new Snapshot Release of version 2022-06. You can still use the last Snapshot Release of version 2022-03. We want to address the current problem and future solutions in the following. We encountered several new issues, but the major problem is that the current version […]

The post DBpedia Snapshot 2022-06 Release appeared first on DBpedia Association.

]]>
The DBpedia team, heartbrokenly, has to inform you: that there will be NO new Snapshot Release of version 2022-06.

You can still use the last Snapshot Release of version 2022-03.

We want to address the current problem and future solutions in the following.

We encountered several new issues, but the major problem is that the current version of DBpedias Abstract Extractor is no longer working. Wikimedia/Wikipedia seems to have increased the requests per second restrictions on their old API. 

As a result, we could not extract any version of English abstracts for April, May, or June 2022 (not even thinking about the other 138 languages). We decided not to publish a mixed version with overlapping core data that is older than three months (e.g., abstracts and mapping-based data).

For a solution, a GSOC project was already promoted in early 2022 that specifies the task of improving abstract extraction. The project was accepted and is currently running. We tested several new promising strategies to implement a new Abstract Extractor.

We will give further status updates on this project in the future.

The announcement of the next Snapshot Release (2022-09) is scheduled for November the 1st.

The post DBpedia Snapshot 2022-06 Release appeared first on DBpedia Association.

]]>