[bioontology-support] [BioPortal] Feedback from Solene Grosdidier

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[bioontology-support] [BioPortal] Feedback from Solene Grosdidier

support

Name: Solene Grosdidier

Email: [hidden email]

Location: https%3A%2F%2Fbioportal.bioontology.org%2Fmappings


Feedback:

Dear Bioportal team,

I am trying to retrieve all the mappings available in Bioportal between SNOMEDCT and NCIT through the API. Unfortunately, after the 321st page, I do get empty json for the following pages (reproducible on 2 different computers). Below is a copy-paste of my script.

#---------------------------------------------------------------------------------------------------------

#!/usr/bin/python3

import urllib.request, urllib.error, urllib.parse
import simplejson as json
import requests
REST_URL = "http://data.bioontology.org"
API_KEY = "2c84c2c2-3510-46fa-b7af-732659784401"

def get_json(url):
opener = urllib.request.build_opener()
opener.addheaders = [('Authorization', 'apikey token=' + API_KEY)]
return json.loads(opener.open(url).read())
#print(REST_URL + "/mappings?ontologies=MEDDRA,SNOMEDCT")
mapping = get_json(REST_URL+"/mappings?ontologies=NCIT,SNOMEDCT")
#print(json.dumps(mapping,indent=4))

pages = mapping["pageCount"]

print(str(pages))
for i in range (1,pages+1):
print("page: " + str(i))
mapping2 = get_json(REST_URL+"/mappings?ontologies=NCIT%2CSNOMEDCT&page=" + str(i))
#print(json.dumps(mapping2,indent=4))
try:
for element in mapping2["collection"]:
print(element["source"] + "\t" + element["classes"][0]["@id"]+ "\t" + element["classes"][1]["@id"])
except:
print("NO COLLECTION")
#-------------------------------------------------------------------------------------------------------------

Can you help me understanding what is happening?
Thank you very much for your help. I am looking forward to hearing from you.

Best,
Solene Grosdidier


_______________________________________________
bioontology-support mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/bioontology-support
Reply | Threaded
Open this post in threaded view
|

Re: [bioontology-support] [BioPortal] Feedback from Solene Grosdidier

Michael Dorf
Hi Solene,

What you’re describing is a known issue: https://github.com/ncbo/ontologies_linked_data/issues/88.

Background: At some point in the past, we’ve implemented a system that prevents expensive COUNT queries going live against our 4store backend. These queries used to really bog down our servers, often resulting in downtime. The COUNT queries used to be executed on paged REST services, like the one that retrieves all mappings for a given ontology.  So, in order to determine the correct number of pages for a given call, our system used to first execute a COUNT query, storing the result in the output. The new system would pre-cache these counts, so when a paged service call is made, the count would be retrieved from a static repository. Unfortunately, there appears to be a bug in this process that triggers the behavior you are seeing.

For your specific example, it’s best to simply use an iterator to go through ALL pages of available mappings until you hit an empty collection instead of relying on the reported totalCount.

Thanks again for your report. Hope this works as a workaround for what you are trying to accomplish.

Michael


On Jun 14, 2019, at 4:08 AM, [hidden email] wrote:

Name: Solene Grosdidier

[hidden email]

Location: https%3A%2F%2Fbioportal.bioontology.org%2Fmappings


Feedback:

Dear Bioportal team,

I am trying to retrieve all the mappings available in Bioportal between SNOMEDCT and NCIT through the API. Unfortunately, after the 321st page, I do get empty json for the following pages (reproducible on 2 different computers). Below is a copy-paste of my script.

#---------------------------------------------------------------------------------------------------------

#!/usr/bin/python3

import urllib.request, urllib.error, urllib.parse
import simplejson as json
import requests
REST_URL = "http://data.bioontology.org"
API_KEY = "2c84c2c2-3510-46fa-b7af-732659784401"

def get_json(url):
opener = urllib.request.build_opener()
opener.addheaders = [('Authorization', 'apikey token=' + API_KEY)]
return json.loads(opener.open(url).read())
#print(REST_URL + "/mappings?ontologies=MEDDRA,SNOMEDCT")
mapping = get_json(REST_URL+"/mappings?ontologies=NCIT,SNOMEDCT")
#print(json.dumps(mapping,indent=4))

pages = mapping["pageCount"]

print(str(pages))
for i in range (1,pages+1):
print("page: " + str(i))
mapping2 = get_json(REST_URL+"/mappings?ontologies=NCIT%2CSNOMEDCT&page=" + str(i))
#print(json.dumps(mapping2,indent=4))
try:
for element in mapping2["collection"]:
print(element["source"] + "\t" + element["classes"][0]["@id"]+ "\t" + element["classes"][1]["@id"])
except:
print("NO COLLECTION")
#-------------------------------------------------------------------------------------------------------------

Can you help me understanding what is happening?
Thank you very much for your help. I am looking forward to hearing from you.

Best,
Solene Grosdidier


_______________________________________________
bioontology-support mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/bioontology-support


_______________________________________________
bioontology-support mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/bioontology-support
Reply | Threaded
Open this post in threaded view
|

Re: [bioontology-support] [BioPortal] Feedback from Solene Grosdidier

S. Grosdidier
Hi Michael,

Thank you very much for your answer.
I am not sure I understand: I already use an iterator to go through all pages of available mappings. 1st empty collection occurs for page number 362. If I cannot rely on the reported totalCount, I should still retrieve those 48243 mappings available between NCIT and SNOMEDCT as indicated in the json response in agreement with the bioportal web browser (https://bioportal.bioontology.org/ontologies/NCIT?p=mappings). Unfortunately, after 361 pages, I get only a subset of the whole information (15953 pairs of NCIT-SNOMEDCT code). Considering the number of pairs I get back per pages, for 48243, number of pages should indeed be 965 (as indicated in the json response).

I had to do the exact same request for mappings between SNOMEDCT and MedDRA on month ago and it unexpectedly worked after several days: my last email to support stayed unanswered and I did believe that somebody did fix that bug at the time.

I understand that this query is heavy and slowing down your servers. The system even seems down this morning (I hope not because of me). Would it be possible to get these mappings in a compressed file temporarily available somewhere for me? I only need the NCIT-SNOMEDCT pair of codes.


I am looking forward to hearing from you.
Best,
Solene




Solène Grosdidier

Scientific Researcher

Department of Medical Informatics

Email:    [hidden email]

Phone:    +31 (0) 10 704 4879

Mailing address:   P.O.Box 2040 – 3000 CA

Visiting address:  Wytemaweg 80 – 3015 CN | Room Na-2609

                          Rotterdam, The Netherlands

 

 


From: Michael Dorf [[hidden email]]
Sent: Saturday, June 15, 2019 12:15 AM
To: [hidden email]
Cc: S. Grosdidier
Subject: Re: [bioontology-support] [BioPortal] Feedback from Solene Grosdidier

Hi Solene,

What you’re describing is a known issue: https://github.com/ncbo/ontologies_linked_data/issues/88.

Background: At some point in the past, we’ve implemented a system that prevents expensive COUNT queries going live against our 4store backend. These queries used to really bog down our servers, often resulting in downtime. The COUNT queries used to be executed on paged REST services, like the one that retrieves all mappings for a given ontology.  So, in order to determine the correct number of pages for a given call, our system used to first execute a COUNT query, storing the result in the output. The new system would pre-cache these counts, so when a paged service call is made, the count would be retrieved from a static repository. Unfortunately, there appears to be a bug in this process that triggers the behavior you are seeing.

For your specific example, it’s best to simply use an iterator to go through ALL pages of available mappings until you hit an empty collection instead of relying on the reported totalCount.

Thanks again for your report. Hope this works as a workaround for what you are trying to accomplish.

Michael


On Jun 14, 2019, at 4:08 AM, [hidden email] wrote:

Name: Solene Grosdidier

[hidden email]

Location: https%3A%2F%2Fbioportal.bioontology.org%2Fmappings


Feedback:

Dear Bioportal team,

I am trying to retrieve all the mappings available in Bioportal between SNOMEDCT and NCIT through the API. Unfortunately, after the 321st page, I do get empty json for the following pages (reproducible on 2 different computers). Below is a copy-paste of my script.

#---------------------------------------------------------------------------------------------------------

#!/usr/bin/python3

import urllib.request, urllib.error, urllib.parse
import simplejson as json
import requests
REST_URL = "http://data.bioontology.org"
API_KEY = "2c84c2c2-3510-46fa-b7af-732659784401"

def get_json(url):
opener = urllib.request.build_opener()
opener.addheaders = [('Authorization', 'apikey token=' + API_KEY)]
return json.loads(opener.open(url).read())
#print(REST_URL + "/mappings?ontologies=MEDDRA,SNOMEDCT")
mapping = get_json(REST_URL+"/mappings?ontologies=NCIT,SNOMEDCT")
#print(json.dumps(mapping,indent=4))

pages = mapping["pageCount"]

print(str(pages))
for i in range (1,pages+1):
print("page: " + str(i))
mapping2 = get_json(REST_URL+"/mappings?ontologies=NCIT%2CSNOMEDCT&page=" + str(i))
#print(json.dumps(mapping2,indent=4))
try:
for element in mapping2["collection"]:
print(element["source"] + "\t" + element["classes"][0]["@id"]+ "\t" + element["classes"][1]["@id"])
except:
print("NO COLLECTION")
#-------------------------------------------------------------------------------------------------------------

Can you help me understanding what is happening?
Thank you very much for your help. I am looking forward to hearing from you.

Best,
Solene Grosdidier


_______________________________________________
bioontology-support mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/bioontology-support


_______________________________________________
bioontology-support mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/bioontology-support
Reply | Threaded
Open this post in threaded view
|

Re: [bioontology-support] [BioPortal] Feedback from Solene Grosdidier

Michael Dorf
Hi Solene,

To clarify, please ignore the pageCount value in your code. Instead, iterate over all the pages for a specific set of ontology mappings. Stop when you encounter an empty page. That simply means there are no more mappings remaining. Example:

http://data.bioontology.org/mappings?ontologies=NCIT%2CSNOMEDCT&page=319 - This is the last full page containing the default number of 50 records inside the “collection” array.
http://data.bioontology.org/mappings?ontologies=NCIT%2CSNOMEDCT&page=320 - This page contains only two remaining records inside the “collection” array.

As far as downloading mappings as a file, unfortunately, this isn’t an option at this point. We've had similar requests in the past and have an existing open ticket in Github that tracks this issue:


We are a small team, and our development schedule is highly selective. Unfortunately, this issue hasn't yet made it to the top of our priority list.

Thanks,

Michael


On Jun 16, 2019, at 10:54 PM, S. Grosdidier <[hidden email]> wrote:

Hi Michael,

Thank you very much for your answer.
I am not sure I understand: I already use an iterator to go through all pages of available mappings. 1st empty collection occurs for page number 362. If I cannot rely on the reported totalCount, I should still retrieve those 48243 mappings available between NCIT and SNOMEDCT as indicated in the json response in agreement with the bioportal web browser (https://bioportal.bioontology.org/ontologies/NCIT?p=mappings). Unfortunately, after 361 pages, I get only a subset of the whole information (15953 pairs of NCIT-SNOMEDCT code). Considering the number of pairs I get back per pages, for 48243, number of pages should indeed be 965 (as indicated in the json response).

I had to do the exact same request for mappings between SNOMEDCT and MedDRA on month ago and it unexpectedly worked after several days: my last email to support stayed unanswered and I did believe that somebody did fix that bug at the time.

I understand that this query is heavy and slowing down your servers. The system even seems down this morning (I hope not because of me). Would it be possible to get these mappings in a compressed file temporarily available somewhere for me? I only need the NCIT-SNOMEDCT pair of codes.


I am looking forward to hearing from you.
Best,
Solene




Solène Grosdidier
Scientific Researcher
Department of Medical Informatics
Email:    [hidden email]
Phone:    +31 (0) 10 704 4879
Mailing address:   P.O.Box 2040 – 3000 CA
Visiting address:  Wytemaweg 80 – 3015 CN | Room Na-2609
                          Rotterdam, The Netherlands

 

 

From: Michael Dorf [[hidden email]]
Sent: Saturday, June 15, 2019 12:15 AM
To: [hidden email]
Cc: S. Grosdidier
Subject: Re: [bioontology-support] [BioPortal] Feedback from Solene Grosdidier

Hi Solene,

What you’re describing is a known issue: https://github.com/ncbo/ontologies_linked_data/issues/88.

Background: At some point in the past, we’ve implemented a system that prevents expensive COUNT queries going live against our 4store backend. These queries used to really bog down our servers, often resulting in downtime. The COUNT queries used to be executed on paged REST services, like the one that retrieves all mappings for a given ontology.  So, in order to determine the correct number of pages for a given call, our system used to first execute a COUNT query, storing the result in the output. The new system would pre-cache these counts, so when a paged service call is made, the count would be retrieved from a static repository. Unfortunately, there appears to be a bug in this process that triggers the behavior you are seeing.

For your specific example, it’s best to simply use an iterator to go through ALL pages of available mappings until you hit an empty collection instead of relying on the reported totalCount.

Thanks again for your report. Hope this works as a workaround for what you are trying to accomplish.

Michael


On Jun 14, 2019, at 4:08 AM, [hidden email] wrote:

Name: Solene Grosdidier 
Location: https%3A%2F%2Fbioportal.bioontology.org%2Fmappings

Feedback:
Dear Bioportal team,
I am trying to retrieve all the mappings available in Bioportal between SNOMEDCT and NCIT through the API. Unfortunately, after the 321st page, I do get empty json for the following pages (reproducible on 2 different computers). Below is a copy-paste of my script.
#---------------------------------------------------------------------------------------------------------
#!/usr/bin/python3
import urllib.request, urllib.error, urllib.parse 
import simplejson as json 
import requests 
REST_URL = "http://data.bioontology.org" 
API_KEY = "2c84c2c2-3510-46fa-b7af-732659784401"
def get_json(url): 
opener = urllib.request.build_opener() 
opener.addheaders = [('Authorization', 'apikey token=' + API_KEY)] 
return json.loads(opener.open(url).read()) 
#print(REST_URL + "/mappings?ontologies=MEDDRA,SNOMEDCT") 
mapping = get_json(REST_URL+"/mappings?ontologies=NCIT,SNOMEDCT") 
#print(json.dumps(mapping,indent=4))
pages = mapping["pageCount"]
print(str(pages)) 
for i in range (1,pages+1): 
print("page: " + str(i)) 
mapping2 = get_json(REST_URL+"/mappings?ontologies=NCIT%2CSNOMEDCT&page=" + str(i)) 
#print(json.dumps(mapping2,indent=4)) 
try: 
for element in mapping2["collection"]: 
print(element["source"] + "\t" + element["classes"][0]["@id"]+ "\t" + element["classes"][1]["@id"]) 
except: 
print("NO COLLECTION") 
#-------------------------------------------------------------------------------------------------------------
Can you help me understanding what is happening? 
Thank you very much for your help. I am looking forward to hearing from you.
Best, 
Solene Grosdidier

_______________________________________________
bioontology-support mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/bioontology-support


_______________________________________________
bioontology-support mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/bioontology-support